ESTIMATION  OF  GENERALIZED  SIMPLE  MEASUREMENT  ERROR 
MODELS  WITH  INSTRUMENTAL  VARIABLES 


By 

JEFFREY  RAY  THOMPSON 


A DISSERTATION  PRESENTED  TO  THE  GRADUATE  SCHOOL 
OF  THE  UNIVERSITY  OF  FLORIDA  IN  PARTIAL  FULFILLMENT 
OF  THE  REQUIREMENTS  FOR  THE  DEGREE  OF 
DOCTOR  OF  PHILOSOPHY 

UNIVERSITY  OF  FLORIDA 


2003 


Copyright  2003 
by 

Jeffrey  Ray  Thompson 


I dedicate  this  work  to  my  parents,  Ron  and  Ellie,  and  my  sister,  Jennifer. 


ACKNOWLEDGMENTS 

First  and  foremost,  I would  like  to  thank  Dr.  Randy  Carter  for  all  the  help 
required  in  writing  this  dissertation.  Dr.  Carter  has  been  an  outstanding  advisor, 
colleague,  and  friend  and  his  patience  and  willingness  to  teach  are  unparalleled. 
Without  his  guidance,  this  dissertation  would  not  have  been  possible.  He  has  given 
me  knowledge,  both  pertaining  to  statistics  and  otherwise,  that  I will  be  able  to 
apply  throughout  my  career  and  many  other  facets  of  my  life,  as  well.  Further  I 
would  like  to  thank  the  members  of  my  committee:  Dr.  Malay  Ghosh,  Dr.  Ramon 
Littell,  Dr.  Ron  Randles,  and  Dr.  Mike  Resnick.  Either  as  teachers  in  my  courses 
or  mentors  outside  of  the  classroom,  each  member  of  my  committee  has  played  a 
unique  and  special  role  in  my  time  here  as  a graduate  student. 

Second,  I would  like  to  thank  my  family.  My  parents,  Elbe  and  Ron,  and 
sister,  Jennifer  (and  I guess  her  husband,  Matt,  too),  have  provided  an  infinite 
amount  of  support  and  encouragement  throughout  this  endeavor.  Even  at  my  most 
frustrating  times,  they  all  had  faith  in  me  and  gave  me  the  will  to  succeed.  There 
are  no  words  to  describe  how  thankful  I am  for  such  unwaivering  support.  I guess  I 
should  just  say,  “I  love  you  all  very  much!” 

I would  like  to  thank  all  my  friends  from  school,  especially  all  my  ofhcemates 
over  the  years  and  those  who  I repeatedly  went  to  with  my  questions.  Whether 
it  was  studying  for  the  comprehensive  or  qualifying  exams  together,  helping  me 
through  the  toughest  of  classes,  or  simply  having  someone  v/ho  has  been  there 
before  to  relate  to,  I know  I would  not  have  made  it  this  far  without  them. 

Further  I would  like  to  thank  those  friends  of  mine  outside  the  world  of 
statistics.  Both  my  life-long  friends  and  the  friends  I have  made  here  since  being  in 


iv 


Gainesville  have  been  crucial  to  my  success  as  a graduate  student  for  being  there 
with  me  when  I needed  to  blow  olf  steam  or  helping  me  put  life  into  perspective 
when  things  with  my  studies  got  difficult. 

I would  also  like  to  thank  those  who  played  early  roles  in  placing  me  on  this 
career  path.  Whether  it  was  taking  me  under  their  wing  or  showing  me  what 
a great  teacher  really  is,  to  all  those  early  influences,  I owe  a great  amount  of 
gratitude. 

Finally,  I would  like  to  thank  my  cat  Bobbi,  who  unconditionally  was  always 
there  to  greet  me  at  my  door  at  the  end  of  the  most  trying  days  and  also  could 
help  put  a smile  on  my  face  at  the  times  I needed  it  most. 


TABLE  OF  CONTENTS 

page 

ACKNOWLEDGMENTS  iv 

ABSTRACT  viii 

CHAPTER 

1 INTRODUCTION:  MEASUREMENT  ERROR  MODELS 1 

1.1  Common  Occurrences  of  the  Measurement  Error  Problem 1 

1.2  Motivating  Examples 2 

1.3  A Known  Solution  6 

2 LITERATURE  REVIEW  AND  BACKGROUND 8 

2.1  The  Multivariate  Measurement  Error  Model 8 

2.2  The  Simple  Linear  ME  Model  with  Normal  Errors  and  True  Values  9 

2.2.1  Non-Identifiability  and  Asymptotic  Bias  of  the  OLS  Esti- 

mator   11 

2.2.2  Additional  Information  Required  for  Identifiability 13 

2.3  The  Multivariable  Linear  ME  Model  Under  Normality 22 

2.3.1  Non-Identifiability  and  Bias 23 

2.3.2  Additional  Information  Required 24 

2.4  Nonlinear  ME  Models 27 

2.4.1  Normal  Theory  Models:  Non-Identifiability  and  Bias  ....  28 

2.4.2  Normal  Theory  Models:  Additional  Information  Required  . 29 

2.4.3  Exponential  Family  Models:  Non-Identifiability  and  Bias  . 32 

2.4.4  Exponential  Family  Models:  Addition  Information  Required  34 

3 GENERALIZED  SIMPLE  M.E.  MODEL  WITH  INSTRUMENTAL  VARI- 

ABLES   42 

3.1  The  Model 42 

3.2  Applications  of  the  GSME  Model 47 

3.2.1  Exponential  Family  General  Linear  Models 48 

3.2.2  Multinomial  Model 49 

3.2.3  Survival  Time  51 

3.2.4  Multivariate  Regression  Model 53 

3.2.5  Nonlinear  Model 55 

3.3  Model  Identification 59 


vi 


4 ESTIMATION  OF  THE  G.S.M.E.  MODEL 66 

4.1  The  Univariate  Case  69 

4.1.1  Translation  to  Latent  Class  Model 69 

4.1.2  Estimation  of  Model  Parameters 74 

4.1.3  Integration  Examples  of  the  Categorization  Step 84 

4.2  Generalization  to  Estimating  a Multivariate  Model  90 

4.3  Asymptotic  Properties 94 

4.4  Consistent  Initial  Estimates 110 

5 APPLICATION  TO  MOTIVATING  EXAMPLE  112 

6 OTHER  POSSIBLE  APPLICATIONS 121 

6.1  Application  for  MCHERDC  with  Developmental  Delay  and  Dis- 

ability Outcome 121 

6.2  Application  to  Alzheimer’s  Disease  126 

7 DISCUSSION  AND  FUTURE  WORK 130 

7.1  Discussion  of  the  GSME  Model  130 

7.2  Monte  Carlo  Study  and  Asymptotic  Efficiency 132 

7.3  The  GSME  Model  with  Error-Free  Covariates 133 

7.4  An  Alternate  Method 136 

REFERENCES 143 

BIOGRAPHICAL  SKETCH 146 


vii 


Abstract  of  Dissertation  Presented  to  the  Graduate  School 
of  the  University  of  Florida  in  Partial  Fulfillment  of  the 
Requirements  for  the  Degree  of  Doctor  of  Philosophy 

ESTIMATION  OF  GENERALIZED  SIMPLE  MEASUREMENT  ERROR 
MODELS  WITH  INSTRUMENTAL  VARIABLES 

By 

Jeffrey  Ray  Thompson 
August  2003 

Chair:  Randy  L.  Carter 
Major  Department:  Statistics 

Measurement  error  (ME)  models  are  used  in  situations  where  at  least  one 
independent  variable  in  the  model  is  imprecisely  measured.  Having  at  least  one 
independent  variable  measured  with  error  leads  to  an  unidentified  model  and  a 
bias  in  the  naive  estimate  of  the  effect  of  the  variable  that  is  measured  with  error. 
One  way  to  correct  these  problems  is  through  the  use  of  an  instrumental  variable 
(IV).  An  IV  is  one  that  is  correlated  with  the  unknown,  or  latent,  true  variable, 
but  uncorrelated  with  the  measurement  error  of  the  unknown  truth  and  the  model 
error.  An  IV  provides  the  identifying  information  in  our  method  of  estimating  the 
parameters  for  generalized  simple  measurement  error  (GSME)  models.  The  GSME 
model  is  developed  and  it  is  shown  how  many  well  studied  ME  models  with  one 
predictor  can  fit  into  its  framework.  Included  in  these  are  linear,  generalized  linear, 
nonlinear,  multinomial,  multivariate  regression,  and  other  ME  models.  The  GSME 
model,  by  design,  can  handle  situation  for  continuous,  discrete,  and  categorical 
observable,  or  manifest,  variables.  We  provide  theorems  that  give  conditions  under 
which  the  GSME  model  is  identified.  The  initial  step  in  our  estimation  method  is 
to  “categorize”  all  continuous  and  discrete  variables.  Categorical  variables  remain 


unchanged.  Assuming  conditional  independence  given  the  latent  variable,  the 
joint  distribution  of  the  categorized  manifest  variables  and  any  that  were  already 
categorical  is  the  product  of  the  conditional  cell  probabilities  and  conditional 
distributions  of  the  categorized  continuous  and  discrete  manifest  variables  summing 
over  the  categorical  values  of  the  latent  variable.  Maximum  likelihood  estimates 
of  the  joint  categorical  distribution  are  used  to  solve  nonlinear  equations  for  the 
parameters  of  interest  which  enter  through  the  conditional  probabilities.  Estimated 
generalized  nonlinear  least  squares  is  used  to  solve  the  equations  for  the  parameters 
of  interest.  We  show  that  our  estimators  have  favorable  asymptotic  properties  and 
develop  methods  of  inference  for  them.  We  show  how  many  commonly  studied 
ME  model  problems  fit  into  the  general  framework  developed  and  how  they  can  be 
solved  using  our  method. 


IX 


CHAPTER  1 

INTRODUCTION:  MEASUREMENT  ERROR  MODELS 
1.1  Common  Occurrences  of  the  Measurement  Error  Problem 
Measurement  error  models  are  models  with  at  least  one  independent  variable 
that  is  measured  with  error.  Aside  from  the  term  measurement  error  models, 
this  area  of  statistics  is  known  by  several  other  names:  errors  in  variables  and 
regression  with  errors  in  x are  but  two  others  [56].  The  measurement  error  problem 
occurs  often  in  practice.  The  medical  field,  agriculture,  and  econometrics  are 
just  a few  areas  that  contain  problems  involving  predictor  variables  which  are 
contaminated  by  measurement  error. 

The  general  measurement  error  problem  occurs  often  in  medical  research 
when  the  association  between  an  imperfectly  measured  predictor  variable  and  a 
response  variable  is  of  interest.  One  of  many  examples  involves  the  occurrence 
of  Alzheimer’s  disease.  Alzheimer’s  disease  affects  roughly  fifteen  million  people 
worldwide  and  over  four  million  in  the  United  States  alone  [32].  One  in  ten 
persons  over  the  age  of  sixty-five  has  the  disease  and  it  also  is  the  primary  cause 
of  dementia  in  the  elderly  [4],  It  is  thought  that  the  level  of  aluminum  deposits, 
which  may  build  up  in  the  brain  over  time,  have  an  effect  on  the  likelihood  of  an 
individual  developing  Alzheimer’s  [12].  If  one  were  to  estimate  the  association 
between  the  likelihood  of  Alzheimer’s  Disease  and  the  level  of  aluminum  deposits  in 
the  brain,  problems  would  arise  because  a perfectly  accurate  measure  of  aluminum 
levels  is  simply  not  attainable.  Measurement  error  models  have  also  been  used  to 
assess  the  reliability  and  validity  of  a measuring  method  in  the  absence  of  a true 
“gold  standard”  method.  Carter  [17],  for  example,  assessed  the  reliability  and 


1 


2 


validity  of  a measure  of  specific  activity  of  the  enzyme  sucrase  that  was  obtained 
from  a homogenated  sample  of  the  small  intestine  of  intestinal  bypass  patients. 

There  are  also  numerous  examples  of  measurement  error  models  occurring 
in  agriculture.  In  one  such  example,  considered  by  Fuller  [24],  the  experience 
and  education  of  a farmer  were  used  as  predictors  in  estimating  farm  size.  In 
this  example,  random  error  was  added  to  each  variable  in  order  to  protect  the 
confidentiality  of  the  respondents.  Therefore,  two  types  of  errors  existed  in  the 
explanatory  variables:  the  error  added  to  protect  confidentiality  and  that  of  the 
original  responses  of  experience  and  education  as  stated  by  individual  farmers. 

In  economics,  one  might  be  interested  in  expressing,  say,  the  net  exports  of 
a commodity  as  a function  of  the  available  resources  to  a country.  Klepper  and 
Learner  [36]  faced  this  problem  when  estimating  the  amount  of  machinery  exported 
by  a country  as  a linear  function  of  the  country’s  land,  labor,  and  capital.  They 
could  not  apply  usual  linear  regression  techniques  to  correctly  estimate  the  amount 
of  machinery  exported,  because  these  three  explanatory  variables  were,  in  their 
words,  “doubtlessly  measured  with  error”  [36,  p.  180]. 

1.2  Motivating  Examples 

The  Radiations  Effects  Research  Foundation  (RERF)  in  Hiroshima,  Japan, 
is  involved  in  radiation  dose-response  studies  using  the  Atomic-bomb  survivor 
data.  Information  from  residents  of  both  Hiroshima  and  Nagasaki  was  obtained 
from  mostly  direct  interviews  with  survivors,  but  also  with  interviews  from  family 
members  and  acquaintances.  Shielding  information  and  distance  from  the  bombs 
hypocenter  were  obtained  from  such  interviews  and  were  used  to  estimate  each 
individual’s  radiation  dose  [41].  These  estimates  of  individual  radiation  doses  have 
substantial  random  errors  [43].  The  random  errors  in  dose  estimates  are  due  largely 
to  uncertainties  regarding  the  survivors’  location  from  the  hypocenter  and  shielding 
[42]  and  imperfect  estimation  of  true  dose  from  these  variables  and  other  error-free 


3 


variables.  Sposto  et  al.  [54]  estimated  and  discussed  the  magnitude  of  the  errors  in 
the  radiation  dose  estimates. 

Pierce  et  al.  [43]  was  interested,  primarily,  in  cancer  incidence  and  mortality 
rates,  where  the  number  of  cases  in  a given  time  interval  was  assumed  to  follow 
a Poisson  distribution,  but  they  also  gave  some  consideration  to  the  analysis  of 
data  for  chromosomal  aberrations,  in  which  the  number  of  cases  was  assumed 
to  be  binomial.  For  example,  in  the  analysis  of  chromosomal  aberration  data, 
the  outcome  was  the  proportion  of  about  100  examined  cells  that  exhibit  an 
aberration,  and  this  outcome  was  assumed  to  follow  a binomial  distribution  [42, 
p.  279].  Modeling  these  outcomes  involved  mainly  generalized  linear  models  with 
varying  assumptions  depending  on  the  outcome  of  interest,  but  they  also  briefly 
mention  applications  in  survival  analysis,  using  a proportional  hazards  model. 

One  of  the  primary  concerns  was  the  incidence  of  leukemia  which  was  modeled  as 
the  hazard  function  of  a grouped  survival  analysis.  The  outcome  was  time  until 
the  diagnosis  of  leukemia  and  it  will  serve  as  the  primary  outcome  of  interest  in 
our  discussion.  Pierce  et  al.  [43]  discussed  examples  where  the  number  of  cases 
follows  a binomial  (for  chromosomal  aberration  data)  and  a Poisson  (for  cancer 
incidence  and  mortality)  distribution,  but  in  both  cases,  they  assumed  an  identity 
link  function  so  the  F’s,  i.e.,  the  outcomes,  have  conditional  expectations  that  are 
linear  in  x.  A Weibull  distribution,  with  slightly  different  parameters  depending 
on  city,  was  assumed  for  true  radiation  dose,  x , and  multiplicative  errors  were 
assumed  for  estimated  dose  that  were  independent  from  individual  to  individual. 
The  unobservable  x was  the  true  radiation  dose  and  the  estimated  value,  X , was 
the  estimated  dose  from  the  Dosimetry  System  1986,  henceforth,  DS86.  Their 
approach  involved  replacing  each  individuals’  true  dose,  Xi,  by  an  estimate  of 
E(xi  | Xi)  and  carrying  out  a weighted  least  squares  analysis  as  if  the  estimate  was 
the  true  dose.  Estimation  of  E(xi  \ Xi)  was  possible  because  the  distribution  of  Xl 


4 


given  Xi  was  assumed  to  follow  a lognormal  distribution  with  a known  coefficient 
of  variation.  (The  authors  recognized  that  the  coefficient  of  variation  actually  is 
not  known,  and  performed  sensitivity  analyses  to  assess  the  potential  impact  of  a 
violation  of  this  assumption  [43,  p.  352].)  A value  for  E(x  | X)  was  obtained  from 
the  relationship  f(x  | Ar)  oc  f(x)f(X  \ x).  A lognormal  distribution  with  known 
coefficient  of  variation  means  that  log(V)  is  assumed  to  be  normally  distributed 
with  conditional  mean  log(:c)  having  a coefficient  of  variation,  assumed  in  this 
case,  of  30%  and  40%.  Under  this  range  of  values  for  a lognormal  distribution,  the 
standard  deviation  of  log(X)  is  approximately  equal  to  the  coefficient  of  variation. 
For  the  precise  relationship  for  lognormal  models,  see  Pierce  et  al.  [42,  p.  278]. 

This  approach  is  an  example  use  of  a method  known  as  the  regression  calibration 
method,  which  will  be  discussed  briefly  in  the  next  section  and  more  thoroughly  in 
the  next  chapter.  The  authors  also  gave  results  for  two  distributions  which  were 
not  lognormal  (namely,  contaminated  lognormal  and  normal).  Ultimately,  their 
recommendation  was  to  use  a lognormal  distribution  with  a 35%  coefficient  of 
variation. 

There  are  two  key  extensions  to  their  method  that  they  mention  but  do  not 
apply.  The  first  concerned  the  need  to  add  covariates  measured  without  error  to 
their  model.  In  the  current  context,  these  covariates  would  be  sex,  city,  and  age 
ATB  and  are  denoted  collectively  by  Z.  Pierce  et  al.  [43]  mentioned  that  in  the 
presence  of  such  variables,  the  estimate  of  E(x  \ X ) should  be  replaced  in  their 
method  with  an  estimate  of  E(x  \ X , Z ).  If  Z is  in  the  model  for  E(Y),  they  state 
that  using  E(x  [ X , Z)  should  reduce  the  bias  in  parameter  estimates.  They  do  not 
pursue  this  idea  further. 

Another  useful  suggestion  they  made,  but  did  not  pursue,  was  the  use  of  a sign 
of  acute  radiation  effects  as  a possible  instrumental  variable  (IV)  in  order  to  avoid 
the  need  to  assume  a known  coefficient  of  variation.  One  possible  sign  of  acute 


5 


radiation  effects  is  the  presence  or  absence  of  severe  epilation.  Severe  epilation  is 
defined  to  be  scalp  hair  loss  of  more  than  67%  [41].  This  variable  does  seem  to 
satisfy  the  requirements  of  an  IV.  Severe  epilation  is  related  to  the  true  radiation 
dose  but  assumably  independent  of  the  measurement  error  in  dose  and  of  the  model 
error  and  measurement  error  in  Y , if  any.  Not  only  do  Pierce  et  al.  not  pursue  the 
use  of  an  IV,  they  do  not  make  any  suggestions  as  to  how  to  incorporate  an  IV  into 
their  method. 

Another  possible  application  of  our  methods,  comes  from  the  area  of  maternal 
and  child  health  epidemiology.  Consider  as  an  outcome,  an  indicator  of  early  child- 
hood disability  or  developmental  delay  (DDD).  It  is  thought  that  developmental 
delay  and  disability  in  early  childhood  may  be  related  to  gestational  age  (GA) 

(see,  for  example,  [66,  30,  7]).  Because  GA  is  known  to  be  measured  with  error, 
however,  researchers  typically  use  birth  weight  as  the  predictor  of  interest  (see,  for 
example,  [60,  61]).  Such  studies  are  useful  and  informative  in  their  own  right,  but 
do  not  answer  questions  concerning  the  relationship  between  the  likelihood  of  DDD 
and  prematurity.  Thus,  the  use  of  GA  would  be  an  ideal  variable  to  assess  this 
relationship,  accept,  in  most  existing  studies,  the  fact  GA  is  often  so  inaccurately 
estimated  is  not  accounted  for. 

The  methods  presented  here  may  offer  a solution  to  this  dilemma.  Since 
there  exist  multiple  methods  for  estimating  GA  (see,  for  example,  [53,  39]),  a 
possible  consideration  for  an  IV  may  be  a second  measure  of  GA,  obtained  from  an 
independent  method.  In  Chapter  6 we  discuss  this  possible  application  and  show 
how  the  outcome  of  interest,  DDD,  may  be  applied  to  our  methods  in  a future 
study  that  would  estimate  the  effects  of  prematurity  on  a child’s  risk  of  DDD  in  the 
first  three  years  of  life. 


6 


1.3  A Known  Solution 

The  solution  provided  by  Pierce  et  al.  [43]  in  the  atomic  bomb  study  is  an 
example  application  of  a methodology  first  proposed  by  Prentice  [44]  in  the  context 
of  a proportional  hazards  model  and  subsequently  generalized  by  Carroll  et  al.  [13]. 
The  method  is  called  the  regression  calibration  method.  It  amounts  to  estimating 
E(x  | .)  and  then  substituting  the  estimates  for  values  of  x in  the  analysis  that 
would  have  been  done  if  x had  been  observed  without  error.  Unfortunately,  E(x  | .) 
cannot  be  estimated  without  additional  information.  The  additional  information 
needed  can  exist  in  the  form  of  a known  or  independently  estimated  parameter 
(Pierce  et  al.  [43],  for  example  assumed  that  the  coefficient  of  variation  of  the 
conditional  distribution  of  x given  X was  known),  repeated  measures  of  x,  or  an 
instrumental  variable  (i.e.,  a variable  that  is  related  to  x but  independent  of  the 
measurement  error  in  x and  independent  of  the  model  error  and  measurement  error 
in  Y,  if  any).  The  regression  calibration  method  can  be  applied  to  either  linear  or 
nonlinear  models. 

It  should  be  noted  that  E(x  | .)  = E{x  \ X)  if  there  are  no  covariates  in 
the  model  that  are  measured  without  error  (Z)  and  no  instrumental  variables  ( W , 
say).  If  there  are  error  free  covariates  but  no  IV  then  E(x  | .)  = E(x  \ X,  Z), 

E(x  | .)  = E(x  | X,  W)  if  there  is  an  IV  but  no  error  free  covariates,  and 
E(x  | .)  = E(x  | X,  Z , W)  in  the  presence  of  both. 

The  methods  proposed  by  Carroll  et  al.  [13]  for  IV  estimation  rely  heavily 
on  two,  rather  restrictive,  assumptions.  The  first  is  that  when  their  methods  are 
applied  to  generalized  linear  models,  they  consider  only  canonical  link  functions. 
Therefore,  they  write  E(Y  \ Z,  x)  = b'(pY\xzx  + Z'Py\izx  + x'Py\izx),  where  (3Y\i zx  is 
the  coefficient  of  1,  i.e.,  the  intercept,  in  the  regression  of  Y on  1,  Z , and  x,  (3y\izx 
is  the  coefficient  of  Z'  in  the  same  regression,  and  j3y\\zx  is  the  coefficient  of  x' . 

The  other  restriction  is  that  there  is  a linear  regression  of  x on  (Z,  W,  V).  In  other 


7 


words,  the  assumption  that  the  covariates  measured  with  error  and  any  covariates 
measured  without  error,  the  instrumental  variables,  and  the  observed  measurements 
of  the  x’s  have  the  following  relation: 

E(x  | Z,  W.  X ) = Px\izwx  + Z'Px\izwx  + W'(3x\izwx  + X'0x\i zwx- 

The  coefficients  are  defined  similarly  as  those  above.  This  assumption  of  a linear 
relation  between  these  variables  is  usually  incorrect  if  x are  categorical.  This 
is  easily  seen  when  x is  dichotomous.  Also,  they  assume  that  X — x + u and 
E(x  - X | Z,  x,  W)  = 0,  and,  therefore,  that  X is  an  unbiased  measurement  of 
x.  The  methods  suggested  in  this  proposal  shall  require  no  such  assumptions  of  a 
canonical  link  (when  applied  to  GLM’s),  a strictly  linear  relation  between  x,  the 
observed  covariates,  and  IV’s,  nor  that  of  an  unbiased  measurement  of  x.  The  key 
assumption  is  that  of  conditional  independence,  which  is  also  an  assumption  made 
by  Carroll  et  al.  [13]. 

In  Chapter  2,  we  define  the  general,  multivariate  measurement  error  model 
for  the  mean  function  of  a response  variable.  After  which,  we  will  look  at  several 
estimators  under  specific  measurement  error  models,  starting  with  the  simplest 
case,  and  discuss  their  properties. 


CHAPTER  2 

LITERATURE  REVIEW  AND  BACKGROUND 
2.1  The  Multivariate  Measurement  Error  Model 
Let  {y.}^  and  {e^}^x  be  sequences  of  p-dimensional  random  column  vectors 
and  let  be  a sequence  of  /c-dimensional  independent  column  vectors. 

Further,  let  j3  € tj)  C Rkp  be  a kp  x 1 column  vector  of  parameters  and  / be  a 
p-dimensional  vector  whose  components  are  real  valued  Borel  measurable  functions 
mapping  Rk  x ^ into  R1  and  describing  E(Y_i  \ xt)  as  a function  of  x{.  Then 

Yi  = /&;_£)+£,  i = 1,2,...,  (2.1) 

where  e{  may  be  a combination  of  model  error  and  measurement  error,  if  it  exists. 
Also,  assume 

E(eij)  = E(xiieiij ) = 0 (2.2) 

for  all  j = 1,  2,  ...,  p,  l = 1,  2,  ...,  k,  i — 1,  2,  ...,  and  i'  = 1,  2,  ...  . If  the  elements 
of  (3  are  not  functionally  related  and 

/fe;  §)  =Bx{,  1 = 1,  2,  ...,  (2.3) 

where  B is  a p x k matrix  of  parameters  formed  by  placing  the  first  k elements 
of  the  kp  x 1 vector  /3  in  the  first  row  of  B,  the  next  k in  the  second  row,  and 
so  on,  then  the  model  defined  by  Equations  2.1  and  2.2  is  called  the  multivariate 
nonlinear  regression  model.  With  the  addition  of  Equation  2.3,  the  model  is 
called  the  multivariate  linear  regression  model.  Note  that  if  the  model  contains 
an  intercept,  the  first  element  of  x{  is  unity.  It  is  well  known  that  under  the 
assumption  of  normally  distributed  model  error,  the  least  squares  estimator  for  R 


8 


9 


(note:  least  squares  does  not  require  normality  assumption)  when  the  x{  are  fixed 
and  the  maximum  likelihood  estimator  when  the  are  random  drawings  from  a 
normal  distribution  are  equal  and  both  unbiased  estimates  [24].  It  sometimes  is  the 
case  that  the  vectors  x±,  i — 1,2,  in  Equation  2.3  are  not  directly  observable, 
but  are  observable  with  error.  This  leads  to  the  so  called  linear  errors  in  variables 
model. 

When  the  vectors  i = 1,  2,  ...,  are  not  directly  observable,  one  may  observe, 
in  a finite  sample, 

X^Xi  + Ui,  i = 1,  2,...,n.  (2.4) 

In  addition  to  the  first  assumption  in  2.2,  we  assume  that  the  k- vectors  i = 

1.2,  ...,n,  of  measurement  error  random  variables  are  such  that 

E(uii)  — E(uuXiii<)  — E(uuyi'j)  0 (2-5) 

for  all  j = 1,  2,  ...,  p,  l = 1,  2,  ...,  k , l'  = 1,  2,  ...,  k , i = 1,  2,  ...,  n,  and 

i!  = 1,  2,  ...,  n.  The  model  defined  by  Equations  2.1,  2.2,  2.4,  and  2.5  is  called  the 

general  measurement  error  (henceforth,  ME)  model.  With  the  addition  of  Equation 

2.3,  the  model  is  called  the  linear  measurement  error  model  (linear  ME).  When  the 
unobservable  xt  are  fixed,  the  model  is  known  as  the  functional  model  and  when 
they  are  nonconstant  random  vectors,  the  model  is  known  as  the  structural  model. 

2.2  The  Simple  Linear  ME  Model  with  Normal  Errors  and  True  Values 
The  majority  of  this  section  will  be  devoted  to  the  structural  model,  but  it 
is  none  the  less  important  to  distinguish  between  the  structural  and  functional 
models.  Again,  the  (xx,  x2,  ...)  in  Equation  2.1  may  be  observations  on  non-random 
variables  in  which  case  the  model  is  defined  to  be  the  functional  model.  In  this 
case,  the  x{  can  be  thought  of  as  a constant  vector  and  they  appear  as  parameters 
in  the  distribution  function  of  ( Y_. X_f).  In  the  structural  model  with  normal 
true  values,  the  xt  are  assumed  to  be  independent  drawings  from  a N(px,  Tjxx) 


10 


distribution.  The  name  “structural”  comes  from  the  fact  that  these  models 
describe  the  structure  of  the  specified  relationship  between  the  random  variables  x 
and  Y_. 

For  illustrative  purposes,  we  shall  begin  with  the  simple  linear  ME  model 
which  contains  one  dependent  and  one  independent  variable  and  has  normally 
distributed  model  error.  Consider  the  model  defined  by 


Yi  — /?o  + PiXi  + e, 

Xi  — X i “l-  Xl-i , 


(2.6) 

(2.7) 


where  i = 1,  2,  ...,  n,  e,  are  independent  iV(0,  aee)  and  may  potentially  be  a 
combination  of  model  and  measurement  error,  and  Ui  is  a 7V(0,  auu)  random 
variable.  So,  at  this  point,  we  have  the  following  assumptions: 


~ 

/ 

\ 

Xi 

& XX 

&xe 

O XU 

ei 

~iid  N 

0 

5 

®xe 

@ee 

O eu 

Ui 

\ 

0 

G xu 

®eu 

® uu 

/ 

(2.8) 


where  ~ud  means  “independently  and  identically  distributed”  and  N means 
“normally”  distributed.  For  some  estimation  purposes,  it  may  be  assumed  that 
Xi,  ei , and  iq  are  independent  for  all  i.  When  there  is  no  measurement  error  (i.e., 
the  classical  regression  model),  it  is  well  known  that  the  least  squares  estimator  of 
Pi  is 


n 


1 -1 


n 


Jj'C.  - xf 


_l  = l 


i= 1 


This  estimator  is  also  the  maximum  likelihood  estimator  and  is  unbiased  for  0\  in 
both  the  functional  and  structural  models. 


In  addition  to  the  assumptions  in  (2.8)  that  the  (xi,  x2,  ■~,xn)  in  Equations 
2.6  and  2.7  are  random  (i.e.,  they  are  drawn  from  a N(nx,  oxx ) distribution)  we 
may  in  some  applications  also  assume  that  Xi,  e*,  and  Ui  are  independent  for  all  i. 


11 


Then  we  have  that  the  vector 


(x(,  ef,  tij)' N 0,  0)',  diag(a„,  <7,,.  crm)} , 


(2.9) 


where  diag()  represents  a diagonal  matrix  with  the  given  elements  on  the  diagonal. 
Then  the  vector  (U,  Xi)',  where  Y)  is  defined  by  Equation  2.6  and  Xi  is  defined  by 
Equation  2.7,  has  a bivariate  normal  distribution  with  mean  vector  E{(Yi,  Xi)'}  = 
(Hy,  Hx)'  = (A)  + PiVx,  Vx)'  and  variance-covariance  matrix 


Var{(Yi,  Xi)'} 


- 

~ 

Gyy 

OXY 

P\  &xx  “t"  ® ee 

Pl&xx 

&XY 

oxx 

Pi  &XX 

& XX  G uu 

2.2.1  Non-Identifiability  and  Asymptotic  Bias  of  the  QLS  Estimator 

Under  the  model  defined  by  Equations  2.6  and  2.7  and  under  the  assumptions 
stated  in  (2.9),  using  the  observed  variables,  one  might  naively  estimate  the 
regression  coefficient,  /3i,  as 


A. 


OLS 


£(*  - *)! 


i= 1 


-1 


E(Vi  - X)(Y,  - Y). 


(2.10) 


i=l 


By  the  properties  of  the  bivariate  normal  distribution,  this  naive  estimator  has 
expected  value  E{/3i,ols}  = = Pifoxx  + Ouu)~lvxx-  One  can  see  that, 

because  the  denominator  is  inflated  by  auu,  the  least  squares  regression  coefficient 
is  biased  towards  zero,  and  that  the  bias  does  not  vanish  with  increasing  sample 
size.  It  is  important  to  note  that  key  assumptions  used  to  derive  the  properties  of 
this  estimator  were  that  the  measurement  error,  U{,  was  independent  of  both  the 
true  values,  and  the  model  errors,  e,.  This  bias  towards  zero  in  the  regression 
coefficient  is  also  referred  to  as  attenuation  of  the  coefficient  toward  zero.  The  ratio 
kxx  = ffyVii,  which  defines  the  degree  of  attenuation,  is  known  as  the  reliability 


ratio,  and  measures  the  reliability  of  X as  a measurement  of  x.  Note  that  the 


12 


ratio  ranges  from  zero  to  1.0  with  larger  values  indicating  greater  reliability  of 
measurement. 

Another  important  concept  that  we  shall  discuss  is  that  of  identifiability  of  a 
model.  Fuller  [24,  p.  9-10]  provides  the  following  definition  of  identifiability:  “Let 
Z be  the  vector  of  observable  random  variables  and  let  Fz(a  : 6)  be  the  distribution 
function  of  Z,  evaluated  at  a,  for  the  given  parameter  9 in  the  parameter  space 
0.  The  parameter  6 is  identified  if,  for  any  9\  G 0 and  92  G 0,  9X  ^ 62  implies 
that  Fz(a  : 9X)  ± Fz{ a : 92)  for  some  a.”  Further,  we  say  that  if  the  vector  9 
is  identified,  then  the  “model”  is  identified  [24,  p.  10].  By  the  term  “model,”  we 
mean  a specification  of  the  variables  and  parameters  of  interest,  the  relationships 
among  the  variables,  and  the  assumptions  about  the  stochastic  properties  of 
the  random  variables.  For  the  structural  model  defined  by  Equations  2.6  and 
2.7,  the  vector  of  unknown  parameters,  9,  is  given  by  (fj,x,  axx,  aee,  auu,  (30,  fii)- 
Under  the  assumptions  of  the  stated  model,  the  observations  (Yi,  Xi)  have  a 
bivariate  normal  distribution.  Properties  of  the  bivariate  normal  distribution  state 
that  it  shall  be  completely  characterized  by  the  elements  of  its  mean  vector  and 
variance-covariance  matrix.  Thus  the  distribution  of  (T,,  X j)  is  characterized  by 
the  five  parameters  in  its  mean  vector  and  variance-covariance  matrix,  namely, 

(Hy,  Hx,  ffyy,  er.xx,  &xy)-  Because  the  model  contains  six  different  parameters 
in  the  vector  9,  it  is  not  possible  to  find  a unique  relationship  between  9 and 
the  parameter  vector  of  the  distribution  of  (T;,  Xi),  which  number  five.  In  other 
words,  there  exist  multiple  parametric  configurations  (i.e.,  different  9)  that  would 
lead  to  the  same  distribution  of  the  observations.  So  by  applying  the  definition 
of  identifiability,  there  exist  vectors  9X  £ 0 and  92  £ ©,  such  that  9X  7^  92,  but 
Fz(a  : 9i)  = Fz( a : 92),  for  all  a.  Thus,  the  model  is  not  identified. 

It  is  possible,  however,  that  certain  individual  parameters  are  identifiable. 

For  this  model,  /rx  is  identified  since  the  mean  of  x is  equal  to  the  mean  of  X.  In 


13 


order  to  construct  a consistent  estimator  for  the  entire  vector  9,  however,  one  must 
specify  additional  information  for  the  model. 

2.2.2  Additional  Information  Required  for  Identifiability 
Known  Parameters 

Identifying  information  could  come  in  the  form  of  known  parameters,  such 
as  known  measurement  error  variance,  cruu,  or  reliability  coefficient,  kxx,  for  the 
measured  value  of  x,-. 

If  the  reliability  ratio,  kxx  = o~^xaxx,  defined  in  Section  2.2.1,  is  assumed 
to  be  known,  then  the  unbiased  estimator  for  the  regression  coefficient  Pi,  in  the 
structural  model,  is  given  by 

Pi  — l^xxPl,OLS, 

where  Pi,ols  is  the  least  squares  coefficient  defined  by  Equation  2.10.  Here,  Pi  is 
referred  to  as  the  regression  coefficient  corrected  for  attenuation. 

Now  let  us  assume  auu  is  known.  Under  the  given  model,  Z = (Y,,  X,)  has 
a bivariate  normal  distribution  and  therefore  the  sample  mean  Z = (T,  X)  and 
the  sample  covariances  (rayy,  mXY,  mxx)  form  a set  of  sufficient  statistics  for 
parameter  estimation.  The  sample  covariances  are  computed  in  their  usual  way, 
e.g.,  rrixY  = (n  - l)-1  £"=1(Xi  - X)(Tj  - Y).  Fuller  [24,  p.  14]  refers  to  rayy,  mXY, 
and  mxx  as  the  maximum  likelihood  estimators  adjusted  for  degrees  of  freedom. 
This  is  due  to  the  fact  that  when  there  are  no  parametric  restrictions  on  the 
covariance  matrix  of  Z *,  n-1(n  — l)mzz  is  the  maximum  likelihood  estimator  of  the 
covariance  of  Z ;.  When  the  parameter  vector  is  identified,  the  maximum  likelihood 
estimator  will  be  a function  of  the  sufficient  statistics.  Recall  that  the  population 
moments  of  (Tj,  X,)  under  the  model  defined  by  2.6  and  2.7,  satisfy 


(<Jyy,  aXY , <?xx)  ~ + <?ee,  Pl^xx  i &xx  + &uu) 

{HYi  Vx)  = (00  + 01  fix,  lix)- 


(2.11) 


14 


By  substituting  the  sample  estimators  of  the  unknown  population  moments 
into  the  left  hand  side  of  Equation  2.11,  one  creates  a system  of  equations  that 
can  be  solved  to  obtain  the  parameter  estimates.  Doing  so  results  in  the  following 
estimators: 

Pi  = (mxx  — & uu)  1mji :y, 

(PXx,  Pee)  = (mxx  ~ &uu,  SYY  ~ Pi mXY), 

(pxJo)  = (X,Y-PxX), 

where  dxx  and  aee  can  be  negative  with  positive  probability.  Knowing  auu  allows 
us  to  construct  a one-to-one  mapping  of  the  minimal  sufficient  statistic  to  the 
vector  (fix,  Pxx,  P0,  Pi,  Pee)-  In  order  for  these  estimators  to  be  proper  estimators, 
they  must  lie  in  the  parameter  space.  So  dxx  and  aee  must  both  be  nonnegative. 
These  two  estimators  will  both  be  positive  as  long  as  mxx  — Ouu  > 0 and 
mYY  - PxmXY  > 0,  or  equivalently,  mYY(mXx  ~ °uu)  ~ mxY  > °- 

Two  other  methods  for  making  use  of  this  type  of  additional  information  are 
described  next.  First,  a method  known  as  simulation-extrapolation,  or  SIMEX,  was 
first  introduced  by  Cook  and  Stefanski  [19]  and  is  also  discussed  in  detail  in  Carroll 
et  al.  [13].  This  method  is  employed  when  there  is  some  additional  information 
available  and  is  useful  in  the  general  ME  model  [13].  The  basic  idea  behind 
SIMEX  is  that  in  a simulation  step,  additional  independent  measurement  error  is 
simulated  and  added  to  the  original  measured  values  in  order  to  create  additional 
data  sets  with  successively  more  variable  values  of  the  independent  variable.  In 
the  extrapolation  step,  the  trend  of  the  bias  induced  by  the  measurement  error 
versus  the  variance  of  the  additional  measurement  error  is  determined  and  then  the 
trend  is  extrapolated  back  to  the  case  of  no  measurement  error  to  provide  a nearly 
unbiased  parameter  estimate. 


15 


The  SIMEX  procedure  is  easiest  to  understand  under  the  simple  linear 
ME  model  defined  by  Equations  2.6  - 2.9.  We  still  assume  auu  is  known.  First 
additional,  independent  measurement  error  with  variance  A mcruu,  where  0 = Ax  < 

A2  < •••  < Am,  is  generated  and  added  to  the  original  data.  Cook  and  Stefanski 
[19,  p.  1317]  recommend  letting  lambda  range  from  zero  to  two  and  in  most  of 
their  examples  they  use  a rather  coarse  grid,  namely,  A € {0,  0.5,  1,  1.5,  2}.  So  the 
total  measurement  error  in  the  mth  data  set  is  (1  + Am)eruu  and  the  least  squares 
estimator  of  the  slope  parameter  in  the  mth  data  set,  /?i,m,  would  consistently 

estimate  ( , — V Finally,  the  problem  is  thought  of  as  a nonlinear 

regression  problem  with  /?x,m  as  the  dependent  variable  and  Am  as  the  independent 
variable,  having  a mean  function  of  the  form  G( A)  = ((Txx+{i+\)a„~')  > ^ — 0, 
and  extrapolation  back  to  A = -1,  the  “no-measurement  error”  case,  yields  the 
parameter  estimate.  See  Figure  2-1  for  a generic  SIMEX  plot. 

One  major  drawback  to  the  SIMEX  method  of  bias  correction,  as  noted  by 
Cook  and  Stefanski  [19,  p.  1324],  is  that  “in  applications  to  real  data,  it  is  gener- 
ally the  case  that  all  assumptions  are  violated  to  some  extent,  and  real  data  are 
seldom  as  amenable  to  analysis  as  simulated  data.”  So  although  their  simulation 
results  are  encouraging,  i.e.,  from  a method  which  is  useful  and  easily  applicable 
to  reduce  measurement  error  bias,  impressions  taken  from  these  simulations  may 
be  consequently  optimistic,  but  one  should  be  wary  of  problems  that  arise  in 
applications  to  real  data. 

A second  general  approach  to  incorporating  additional  identifying  information 
into  a measurement  error  analysis  was  first  introduced  by  Prentice  [44]  for  the 
proportional  hazards  model  and  as  a general  approach  by  Carroll  and  Stefanski 
[15].  It  is  known  as  the  regression  calibration  method.  This  method  is  discussed 
in  detail  by  Carroll  et  al.  [13].  Regression  calibration,  like  SIMEX,  cannot  be 
implemented  without  some  additional  information.  In  fact,  these  should  be  viewed 


16 


Figure  2—1:  SIMEX  Plot 

A generic  SIMEX  Plot  showing  the  effect  of  measurement  error  of  size  (1  + A)cruu  on 
parameter  estimates.  Note  that  the  SIMEX  Estimate  occurs  at  A = —1  and  the 

Naive  Estimate  occurs  at  A = 0. 


17 

as  methods  of  incorporating  the  additional  information  that  allows  parameter 
identification. 

The  basic  idea  of  regression  calibration  is  to  replace  Xi  by  an  estimate  of 
E(xi  | Zi,  Xi),  where  are  covariates  measured  without  error,  and  then  perform 
a standard  analysis.  Note  that  there  are  no  Z{  covariates  in  the  simple  linear  ME 
model,  but  we  include  them  in  this  discussion  because  most  existing  literature 
which  specifically  implements  regression  calibration  includes  such  error  free 
covariates.  One  draw  back  to  this  method  is  that  estimating  E[x{  \ Zi,  Xi) 
often  requires  information  that  is  specific  to  the  current  problem  only,  and  there 
is  no  general  methodology  to  estimate  this  quantity.  In  the  simple  linear  ME 
model  with  known  kxx  or  auu,  the  regression  calibration  method  produces  the 
usual  correction  for  attenuation.  This  is  easily  seen  with  the  present  simple 
linear  ME  model  under  the  current  normality  assumptions  since  it  is  known  that 
E(xi  | Xi)  = nx  + cr^axxiXi  - /i x ).  Note  that  X is  an  unbiased  estimate  of 
Hx  and  axxaxX  = crxx/(axx  + auu)  = kxx  is  known  or  can  be  estimated  when 
kxx  or  auu  is  known  or  estimated.  So  regression  calibration,  under  the  current 
model,  amounts  to  needing  to  estimate  or  have  knowledge  about  kxx.  The  resulting 
estimator  using  the  regression  calibration  technique  under  the  model  defined  by 
2.6  and  2.7  and  with  assumptions  defined  in  2.9  with  auu  known,  for  example, 
is  pi  = (mXx  - ^uu)_1mxy  = KA,ols-  When  kxx  is  known  the  regression 
calibration  estimator  is  = kxx/3i,ols- 
Repeated  Measures  of  True  Values 

Independent,  repeated  measures  of  the  truth  allows  for  an  independent 
estimate  of  auu  in  the  simple  linear  ME  model.  Suppose  there  are  two  replicate 
observations,  then  we  would  have  Xu  = Xi  + Un  and  X&  = x^  + u^.  So 
( Xu  Xfi)  = (tiji  Utf)  and  SXl_x2  = Vqit(uh  ^2)  — 2<ruu.  Therefore 

duu  = SXi_X2/2.  Then  a consistent  estimate  of  kxx,  the  reliability  ratio,  is 


18 


kxx  = (mxx  - duu)/mxx , since  Var{X)  = Var(x)  + Var(u)  under  assumption  2.9, 
and  the  resulting  estimator,  derived  from  the  method  of  moments,  is  fix toLs/ Kxx, 
the  bias  adjusted  version  of  fix ,ols-  As  in  the  case  with  known  measurement  error, 
this  estimator  is  also  the  regression  calibration  estimator.  Usually  in  cases  with 
repeated  observations,  the  replicate  means  may  be  thought  of  as  better  measure  of 
x than  a single  observation  [13,  p.  13].  Therefore  these  replicate  means  are  used  in 
place  of  the  single  observation  throughout  the  analysis  when  repeated  measures  are 
available,  i.e.,  rewrite  the  model  as  Yi  = /3 o + /3\Xi  + ej  and  Xi.  = Xi  + u,.. 

With  an  independent  estimate  of  auu,  Carroll  et  al.  [13]  state  that  either 
SIMEX  or  the  regression  calibration  technique  can  be  used  to  develop  estimators 
for  fix-  The  regression  calibration  estimator  is  given  there  as  fix,OLS^xx  / ij^xx  — 

&uu)  ^xx/^1  ,OLS- 

Instrumental  Variables 

Assume,  again,  that  the  model  defined  by  2.6  and  2.7,  the  simple  linear  ME 
model,  holds.  So,  we  have 


Xi 

( 

i 

1 

O xx 

&xe 

G XU 

\ 

e* 

~iid  N 

0 

) 

&xe 

&ee 

O eu 

Ui 

V 

0 

^ XU 

®eu 

&UU 

) 

In  this  context,  the  definition  of  an  instrumental  variable  (IV),  given  by 
Carroll  et  al.  [13,  p.  107],  is  a variable  that  is  correlated  with  the  true  variable, 
Xi,  but  is  uncorrelated  with  the  measurement  error,  u,,  and  the  model  error, 
ti . One  possible  choice  of  an  instrumental  variable  is  a second  measure  of  X{ 
from  a conditionally  independent  (given  xfi  method  of  measurement  (see,  for 
example,  [17]),  or  similarly  from  a second  response  variable  that  is  correlated  to 
Xi  but  independent  both  of  U{  and  ej.  Greenland  [28]  provides  a nice  introduction 
and  overview  of  IV’s  for  a non-statistical  audience  by  discussing  their  role  in 


19 


epidemiological  studies.  We  shall  denote  the  instrumental  variable  as  W{.  Fuller 
[24,  p.  51]  provides  a formal  definition.  According  to  his  definition,  under  the 
model  defined  by  2.6  and  2.7,  Wi  is  an  instrumental  variable  if  the  following  two 
conditions  are  met: 


(*)  E jn"1  Y^(Wi  - W){ei,  .)}  = (0,  0) 

n)  E\n-lYj{Wi-W)xi  1^0 


(2.13) 


i= 1 


where  W = Ya=i  W<-  For  convenience,  let  us  express  the  fact  that  and 
Wi  are  related  by  using  a parametric  expression.  To  do  this,  we  shall  denote  the 
parameters  of  the  population  regression  of  Z,  on  Wi  by  7t2,  for  the  slope,  and  7Ti,  for 
the  intercept,  where 


7T2  = 


E Y,(W<  - W)‘ 


i-1 


1=1 


7T]  = E {x  - 7T2VF}  . 


Using  these  parameters  in  the  regression  equation  of  x{  on  Wi,  we  can  write 
Xi  = TTx+7 T2Wi+n,  i = 1,  2,  ...,  n.  Here,  r{  represents  the  failure  of  x{  to  be  perfectly 
linearly  related  to  Wi,  or  simply,  the  model  error  in  this  regression  equation.  By 
the  least  squares  regression  method  used  to  construct  this  equation,  ri  has  zero 
correlation  with  Wi,  i.e.,  the  “error”  term  is  independent  of  the  “independent 
variable.  By  substituting  the  fact  that  Xi  = Xi  + iq,  we  have  Xi  = ti\  + 7r 2Wi  + 

(n  + Ui ) = m T 7 T2Wi  + Oj.  Note  that  £ {E"=i  Wiai)  = 0 by  the  assumption 
E{n~1j:';=1(Wl-W)ui}  = 0. 

Here  = 7Ti  + 7r 2l^w  and  note  that  we  do  not  assume  that  <reu  or  oxu  is 
zero.  Assuming  the  IV  is  normally  distributed,  along  with  the  assumptions  from 
Equation  2.12,  we  have  that  the  vector  of  observed  data,  (Y,  Xi,  Wi),  is  normally 
distributed  with  mean  (//y,  nx,  l^w)  — (A)  + AWi  + Mw)  and 


20 


variance-covariance  matrix 

Pl^xx  + 2/3i<7le  -I-  (Tee  Pi  @xx  ”t”  @xe  “t”  @l&xu  “I”  &eu  Pl^2 &WW 

P\Oxx  + &xe  + Plaxu  + <T eu  @ xx  + 2 Oxu  + auu  'K2<JWW 

PlTT2<JwW  7T2  O’WW  ^WW 

Under  the  model  defined  by  2.6,  2.7,  and  2.12  and  in  the  presence  of  an 
instrumental  variable,  there  are  12  independent  unknown  parameters,  namely 
Po,  Pi , A^w i tt i , 7T2,  (Txx)  @xei  o eei  ®xui  @eui  ut  and  (Jww-  This  easily  can  be  seen 
because  we  have  the  following  assumptions: 


- 

/ 

\ 

Xi 

AU 

(T  xx 

@xe 

<TXu 

°xW 

~iid  N 

0 

5 

&xe 

®ee 

&eu 

0 

Ui 

0 

&XU 

G eu 

&UU 

0 

. Wi  . 

V 

A lw 

&xW 

0 

0 

<7ww 

/ 

Note,  again,  that  and  we  assume  axw  — ^2<xww  7^  0.  There  are 

9 sample  statistics  that  make  up  the  set  of  minimal  sufficient  statistics  for  a sample 
of  n observations.  They  are  from  the  sample  mean  vector  and  sample  variance- 
covariance  matrix  of  (F*,  Xi:  Wp.  We  know,  therefore,  that  we  cannot  hope  to 
estimate  all  12  population  parameters.  Nevertheless,  we  can  develop  an  estimator 
for  Pi  by  noting  that  the  ratio  of  covariances  = {^2^ww)~l  Pi^2<xww  — 

Pi.  So,  we  can  estimate  p0  and  Pi  by 


Pi  = rn^wmYw , (2.14) 

Po  = Y-piX,  (2.15) 


where  F and  X are  the  sample  means  and  mxw  and  mYw  are  the  sample  covari- 
ances. Under  the  stated  assumptions,  Po  and  Pi  are  consistent  estimators,  since  the 
sample  moments  are  consistent  estimators  of  the  population  moments. 


21 


Fuller  [24,  p.  53]  provides  a theorem  and  proof  concerning  the  asymptotics 
of  the  estimators  given  in  2.14  and  2.15.  Under  the  given  model  assump- 
tions, along  with  the  assumptions  that  awe  and  aWu  are  zero,  axw  7^  0,  and 
E {(Wi  — Hwfvl}  = aww^vv  and  E {v?(Wi  — Hw)}  = 0,  where  u,-  = e*  — PiUi,  then 


- 

/ 

" 

\ 

A) 

-Po 

0 

&vv  + Mx^22 

—^1^22 

-+L  N 

& 

-Pi 

V 

0 

~l^x^22 

V22 

) 

where  V22  = (yxw°ww^w 

Carter  and  Fuller  [18]  discuss  instrumental  variable  estimation  and  its  prop- 
erties in  the  simple  ME  model  with  slightly  more  restrictive  assumptions.  They 
assumed  that  (xi , u,,  Wi)1,  i = 1,  2,  ...n,  are  distributed  as  independent  draw- 

ings from  a multivariate  normal  distribution  with  a zero  mean  vector  and  a 
covariance  matrix,  E,  given  by 


E = 


oxx  0 0 &xW 

0 <Tee  @eu  0 

0 <Teu  @ uu  0 


axw  0 0 <7ww 

They  derived  restricted  maximum  likelihood  estimators  for  the  models 
where  the  error  covariance,  aeu,  is  known  to  be  zero  and  where  aeu  is  unknown. 
The  consistency  of  the  restricted  maximum  likelihood  estimator  of  Pi  was  also 
shown  and  it’s  asymptotic  distribution  derived.  Details  of  their  work  under  these 
assumptions  can  be  found  in  an  Iowa  State  University  dissertation  by  Carter  [16]. 

Recall  in  regression  calibration  an  estimate  of  E(xt  j Zi,  Xi),  where  Zi  are 
covariates  measured  without  error  (which  do  not  exist  in  the  present  model),  may 
be  used  to  replace  X{  and  then  a standard  analysis  can  be  run.  In  the  presence  of 
an  unbiased  IV  Wi,  (i.e.,  where  E(xi  | Zi,  Xp  = E(Wi  \ Zi,  Xi),  which  is  the 
case,  for  example,  when  Xi  = 7r0  + 'KiWi  + Ti  and  7Tq  = 0,  ni  = 1)  Carroll  et  al. 


22 


[13]  point  out  that  E{W{  \ Z{,  X{)  is  obtained  from  the  regression  of  Wt  on  Z* 
and  X i and  is  an  unbiased  estimate  of  E(xi  \ Zi,  Xi ).  When  no  such  unbiased 
instrument  exists,  which  is  usually  the  case,  and  one  wishes  to  estimate  E(xi  \ Xi), 
recall  under  normality  this  expectation  is  fix  + (J^crxxiXi  — /xx),  where  /rx  is 
estimated  unbiasedly  by  X and  one  is  left  with  estimation  of  oxxoxx  — ^XX' 

When  an  IV  is  present,  it  can  be  shown  that  an  estimate  of  kxx  may  be  obtained 
by  regressing  W on  X.  The  resulting  estimator  is  kxx  = mXY'irnxw(™Ywmxx)~1 
which  can  be  seen  from  the  variance-covariance  matrix  of  the  observed  data  since 

O’XY&Xwi&YW&Xx)  1 = /3lO’xx'^2crWW  {Pl'^2crww{<^xx  + <^uu))  — OxxiPxx  V &uu)  ; 

when  axe,  axu,  and  oeu  are  zero. 

SIMEX  estimators  rely  on  knowing  or  estimating  the  measurement  error 
variance  and  using  this  to  generate  data  sets  with  successively  larger  measurement 
error  variances  and  using  these  new  data  sets  then  to  extrapolate  back  to  the  no 
measurement  error  case.  Because  the  SIMEX  method  uses  measurement  error 
variance  as  the  identifying  piece  of  information,  SIMEX  estimators  could  be  used 
with  instrumental  variables  by  using  the  IV  to  estimate  auu  and  then  using  ouu  in 
the  SIMEX  procedure.  This,  of  course,  would  be  unnecessary  except  in  a study  to 
compare  these  two  methods  of  estimation 

2.3  The  Multivariable  Linear  ME  Model  Under  Normality 

Now  we  consider  the  linear  ME  model  that  contains  more  than  one  x variable. 
Let 


Xi  — A)  + + e»,  (2-16) 

Xi  = + Ui,  i = 1,  2,  ...,  n,  (2.17) 

wdiere  is  a ^-dimensional  column  vector,  /3  is  a ^-dimensional  column  vector, 
and  the  (k  + l)-dimensional  vectors  — (e,-,  u')'  are  independent  normal  (0,  Ee£) 
random  vectors.  If  we  assume  x{  in  Equation  2.17  to  be  normally  distributed  and 


23 


independent  of  e*  and  ut,  then  we  have  the  following: 


( 

A* 

—X 

Exx 

0 

0 

\ 

e-i 

~iid  N 

0 

0 

®ee 

y 

‘-‘eu 

. . 

\ 

0 

0 

y 

^ eu 

^ uu 

/ 

(2.18) 


It  is  worth  noting  here  that  Eeu  is  a 1 x k row  vector. 

As  in  the  simple  ME  model,  for  different  estimation  purposes,  e*  and  «,• 
may  be  assumed  to  be  independent.  The  vector  of  observed  data,  (Tj,  A*),  has 
a multivariate  normal  distribution  with  mean  vector  E{(Yi,  A-)}  = (//y,  n'  ) = 
(0o  + A^./^,  A4 ' ) and  variance-covariance  matrix 


Var{(Yit  A')} 


oYy 

Yxy 

0_ + Oee  {Yxx0_^)'  + Eeu 

1 

M 

Yxx 

Exx01  + E'eu  Exx  + E uu 

2.3.1  Non-Identifiability  and  Bias 

In  addition  to  the  model  defined  in  Equations  2.16  and  2.17  with  assumptions 
defined  in  2.18,  let  us  assume  e*  and  are  independent,  for  all  i.  Then  we  have  the 
vector 


(*',  eii  M')'  ~nd  N 


(/Ai  d,  0 ) , block  diag(Exx,  o"ee,  Euu) 


and  the  variance-covariance  matrix  for  the  observed  data  becomes 


Var{(Yit  A')} 


gYy 

E XY 

§[J^xx0l  + oee 

y 

‘-‘XY 

Exx 

Exx01 

^ XX  ^UU 

Naively  estimating  the  0 parameter  vector  using  ordinary  least  squares 
gives  the  estimator  01  QLS  ~ SXXSXY  which  is  biased  when  Euu  yA  0 because 
E(0 1 QLS)  = (Exx  -I-  Y,uu)~1Y,xx0l,  when  and  Y{  are  jointly  normal. 

The  structural  ME  model  defined  by  Equations  2.16  and  2.17,  with  assump- 
tions in  2.18,  is  not  identified  without  additional  information,  according  to  Fuller’s 


24 


definition  of  identifiability  stated  in  Section  2.2.1.  Having  the  additional  informa- 
tion that  T,eu  and  Yuu  are  known,  Fuller  [24]  states  that  the  model  is  identified 
because  in  that  case  the  number  of  unknown  parameters  in  the  vector  0,  namely 
\a> , 8q  (3' , cree,  vec/i£x*],  where  vechA  of  a symmetric  matrix  A is  the  vector 
formed  from  the  distinct  elements  of  A (i.e.,  vechA  = ( an , a2i,  «3i,  «22,  a32,  a33) 
if  A is  a 3 x 3 matrix)  is  equal  to  the  number  of  elements  in  the  vector  of  statistics 
[Y,  X,  vech  Var{(Yi,  JQ}],  where  Var{(Yu  X')}  is  the  MLE  of  Var{{Yi,  X')}. 


Therefore,  when  £eu  and  Euu  are  unknown,  the  number  of  unknown  param- 
eters in  0 exceeds  the  number  of  distinct  elements  in  the  vector  of  statistics 
7,  x',  vechVar{(Yi,  X')}  , meaning  that  it  is  not  possible  to  find  a unique 
relationship  between  0 and  the  the  distribution  of  the  vector  {Yi:  X').  So  applying 
Fuller’s  definition  gives  the  result  that  without  additional  information,  the  model  is 

not  identified. 

2.3.2  Additional  Information  Required 


Known  Parameters 

We  shall  now  see  that  when  some  parameters  are  assumed  known,  applying 
likelihood  methods  to  the  structural  form  of  model  2.16  and  2.17,  gives  estimators 
that  are  consistent  and  asymptotically  normal.  Let  x be  random  in  the  model 
defined  by  2.16  and  2.17,  under  assumptions  defined  in  2.18.  Assume  that  Yuu 
and  Eeu  are  both  known.  Fuller  [24]  shows  that  the  number  of  parameters  in  the 
vector  0 are  equal  to  the  number  of  elements  in  the  vector  of  sufficient  statistics. 
Therefore,  because  the  sample  mean  vector  and  (n  - 1 )/n  times  the  sample 
covariance  matrix  maximize  the  likelihood,  the  functional  invariance  property  of 
the  maximum  likelihood  method  gives  the  maximum  likelihood  estimators  adjusted 


25 


for  degrees  of  freedom  as 


d 


i 


&ee 


E 


XX 


(*',  v - E3,), 

{m-xX  X uu ) ij^XY  ^eu)i 

vtlyy  — 2 mxYp.-i  + mxx §_x  + 2Eeu^1  — /^E^/^, 

m**  — £uu> 


where  m represents  the  unbiased  sample  covariance  matrix.  Again,  these  estimators 
could  result  in  negative  estimates  with  positive  probability,  however,  the  quantities 
given  are  proper  estimators  provided  aee  is  greater  than  zero  and  EIX  is  positive 
definite. 

Carroll  et  al.  [13,  p.  95]  show  that  in  multiple  linear  regression  the  SIMEX 
estimator  with  the  error  variance  known  results  in  the  usual  method  of  moments 
estimator  when  an  extrapolant  function  of  the  form  G{ A,  T)  = 71  + where 
r = (71,  72,  73)',  is  used. 

Repeated  Measures 

Usually  in  practice,  the  matrices  Euu  and  Eue  are  unknown,  but  they  can 
be  estimated  when  repeated  measures  are  observed.  If  the  measurement  error 
covariance  matrices  are  to  be  estimated,  more  details  must  be  given  about  the 
model  defined  in  2.16  and  2.17  since  may  be  composed  of  two  parts.  The  true 
Ui  should  be  expressed  with  error  in  the  equation,  i.e. , iji  = x'/3  + , where  cjf,  are 

independent  (0,  aqq)  random  variables  that  are  independent  of  Xj  for  all  i and  j. 

We  observe  {Yu  X')  = (yi}  x')  + (wi}  U. ■),  where  (wt,  u')  = a • ~ Ar/(0',  Eao),  and 
a-  is  independent  of  (qj,  x')  for  all  i and  j.  Fuller  [24,  p.  106]  notes  that  generally 
the  variance  of  qi  is  not  known,  however,  it  is  possible  to  conduct  experiments  to 
estimate  the  covariance  matrix  of  a]  = (iCj,  u[).  He  also  provides  the  estimator 
of  the  coefficient  vector,  which  is  analogous  to  in  the  known  parameter  case,  as 
Ji  = {mXx  ~ 5’uu)_1(m^r  - S'uw)  where  Sm  is  an  unbiased  estimator  T,m,  allowing 


26 


one  to  obtain  Suu  and  Suw.  Note  that  §_  now  contains  terms  for  the  intercept 
and  the  slope  parameters.  Further,  Fuller  gives  a consistent  estimator  of  aqq  as 
aqq  = svv  - ( Sww  - 2 ~PSUW  + where  = (n  - A:)"1  £IU(^  " and 

gives  a theorem  and  proof  concerning  the  limiting  distribution  of  the  estimators 
p and  aqq.  Letting  6 — (ft  , <r?g)5  the  theorem  states  that  9 is  consistent  for 
O'  — (ft1 , aqq),  has  a normal  limiting  distribution,  and  provides  a formula  for  its 
asymptotic  variance. 

Fuller  [23]  identifies  five  cases  on  different  amounts  of  information  available 
for  estimating  the  covariance  matrix  of  the  measurement  error.  In  all  cases,  the 
limiting  distribution  of  the  resulting  estimators,  normalized  by  n1/2,  is  normal. 

The  estimators  he  provides  are  consistent  for  /3  and  he  gives  formulae  for  their 
asymptotic  covariance  matrices. 

Instrumental  Variables 

Assume  the  model  defined  by  2.16  and  2.17  with  the  addition  of  a q- 
dimensional  vector  of  instrumental  variables,  W where  the  number  of  instrumental 
variables  is  equal  to  or  exceeds  the  number  of  variables  measured  with  error,  i.e., 
n > q > k.  Conditions,  given  by  Fuller  [24,  p.  149],  that  n and  VFi  must  satisfy  are 
that  V"_1  IF, IF'  is  nonsingular  with  probability  one,  E{W_i(ei,  u')}  = (0,  0),  and 
the  rank  of  (£"=1  W_iW^i)~1  £I*=i  W&  is  k,  the  dimension  of  x{,  with  probability 
one.  This  specification  is  to  allow  for  a zero  variance  for  some  of  the  elements  of 
X_..  An  individual  element  of  x{  measured  without  error  must  be  a linear  function 
of  W_i.  Therefore,  a variable  measured  without  error  may  serve  as  the  instrumental 
variable  for  itself. 

Fuller  [24]  developed  the  instrumental  variable  estimator  for  the  coefficient 
vector  in  a method  similar  to  that  used  for  the  simple  linear  ME  model.  The 
estimator  is  given  by 

= (.WX'y'iWY), 


27 


where  W is  the  q x n matrix  of  observations  on  the  q instrumental  variables.  Fuller 
gave  a theorem  and  proof  stating  the  estimator  is  consistent  and  has  a normal 
limiting  distribution.  A formula  is  also  given  for  its  asymptotic  variance. 

As  previously  mentioned,  there  is  no  practical  use  to  perform  SIMEX  estima- 
tion with  instrumental  variables  since  SIMEX  relys  on  either  knowing  or  estimating 
the  measurement  error  variance  as  the  required  identifying  piece  of  information.  To 
use  SIMEX  with  instrumental  variables,  an  estimate  of  would  be  obtained  after 
the  estimation  (3  . Therefore,  it  is  not  logical  to  use  SIMEX  with  IV’s. 

2.4  Nonlinear  ME  Models 

In  this  section  we  discuss  ME  models  which  are  nonlinear  in  either  or  (3. 
Fuller  [24]  says  it  is  conventional  to  consider  ME  models  to  be  nonlinear  only  when 
the  /?’ s enter  the  mean  function  in  a nonlinear  manner  or  when  the  mean  function 
is  nonlinear  in  the  explanatory  variables  measured  with  error.  Let  the  model  be 
defined  by 


Vi  = M®*;  §)  + Qi, 

(2.19) 

Yi  = yi  + Wi, 

(2.20) 

—i  —i  T Uj, 

(2.21) 

where  i = 1,2,  ...,  n and  h(.)  is  a real  valued  continuous,  nonlinear  function.  We 
may  combine  Equations  2.19  and  2.20  into  one  equation  giving  Yj  = h (a^;  /3)  + e*, 
where  e*  is  the  sum  of  both  measurement  error,  uq,  and  random  equation  error,  g*, 
if  they  exist,  and  et  = (eit  u')  are  independent  random  vectors  having  mean  0 and 
covariance  matrix  S££.  Often  it  is  assumed  that  the  errors  are  normally  distributed. 
We  will  primarily  be  interested  in  estimating  the  unknown  vector  (3.  This  proves, 
however,  to  be  more  challenging  in  nonlinear  ME  models  compared  to  linear  ME 
models  since,  “in  most  situations  it  is  not  possible  to  obtain  an  explicit  expression 
for  the  maximum  likelihood  estimator  of  the  parameter  vector  for  the  nonlinear 


28 


model”  [24,  p.  230].  Fuller  [24,  p.  226]  provides  a formal  definition  of  a nonlinear 
ME  model  stating  the  model  defined  by  2.19,  2.20,  and  2.21  is  nonlinear  if  h(x , ft 
is  nonlinear  in  x when  (3  is  fixed  or  if  h(x ; (3)  is  nonlinear  in  /3  when  x is  fixed.  We 
will  once  again  focus  mainly  on  the  structural  case  where  the  x ^ are  assumed  to 
be  random  with  mean  E(x3)  = and  variance-covariance  matrix  T,xx.  If  x{  are 
assumed  to  be  independent  of  the  errors,  then  we  have 


( 

Hr 

Exx  Q 0 

\ 

e. 

0 

) 

0 (Xgg  Eeu 

_ 

V 

0 

0 Eeu  Euu 

/ 

where  F might  be  any  distribution  assumed  for  the  model.  In  the  next  subsection, 
we  discuss  the  case  where  F is  a multivariate  normal  distribution. 

2.4.1  Normal  Theory  Models:  Non-Identifiability  and  Bias 

This  class  of  ME  models  are  those  whose  error  distributions  are  independent 
of  their  mean  function.  Griliches  and  Ringstad  [29,  p.  370]  showed  that  the  bias  in 
the  classical  least  squares  estimators  is  exacerbated  when  the  regression  function 
is  nonlinear.  Their  results  are  for  a specific  nonlinear  model,  but  they  state  that 
their  work  “can  be  viewed,  however,  as  an  approximation  to  the  estimation  of 
more  general  nonlinear  models” . They  assumed  additive  measurement  error, 

X = x + u,  with  u and  x normally  distributed,  E(u)  = 0,  E(xu)  = 0,  and 
parameterized  the  model  such  that  oxx  — 1-  The  authors  denoted  auu  by  A.  Their 
model  was  T = ft  + ftx  + (32x2  + e,  nonlinear  in  the  random  variable  x , with 
E(e)  = E(xe)  = E{xie ) = 0,  and  all  variables  were  univariate.  They  showed 
that  the  naive  ordinary  least  squares  estimate  of  ft,  ft,  is  biased  towards  zero  by  a 
factor  of  (1  - A),  where  in  their  notation,  A was  the  fraction  of  error  variance  in  the 
total  variance  in  the  observed  variable,  i.e.,  auu.  The  problem  became  even  more 
serious  for  the  nonlinear  terms,  in  that  the  naive  ordinary  least  squares  estimate  of 
ft,  ft,  was  biased  towards  zero  by  the  square  of  the  bias  factor  of  the  linear  term. 


29 


As  in  the  linear  ME  model,  the  naive  estimator  would  result  from  least  squares 
estimation  using  the  observed  X_  in  place  of  the  latent,  unobserved  x. 

In  the  general  model,  the  naive  approach  is  to  fit  the  model  = h(X_{\  (3)  + e* 
by  classical  least  squares  or  by  a semiparametric  or  nonparametric  fitting  method. 
Gleser  [26]  improves  upon  this  naive  estimator  in  a paper  from  the  proceedings  of 
the  1989  AMS-IMS-SIAM  Joint  Summer  Research  Conference.  He  works  under  the 
assumptions  that  E(e)  = E(u)  — 0 and  that  e,  u , and  x are  mutually  statistically 
independent.  In  fitting  the  model,  he  takes  independent  and  identically  distributed 
(i.i.d.)  observations,  (Y),  Xi ),  on  ( Y , X ) and  therefore  (aq , e*,  iq)  are  i.i.d.  each 
with  the  same  distribution  as  (x,  e,  u).  In  his  words,  the  naive  approach  “in 
general,  leads  to  inconsistent  estimators  with  a high  degree  of  asymptotic  bias. 

[26,  p.  99]  He  points  out  that  consistent  estimators  have  been  obtained  in  special 
cases,  but  the  general  problem  of  finding  consistent  and  efficient  estimators  is  still 
unsolved. 

2.4.2  Normal  Theory  Models:  Additional  Information  Required 
Known  Parameters 

Gleser  [26]  discusses  solutions  to  the  problem  of  parameter  estimation  where 
identifying  parameters  are  assumed  to  be  known,  a case  not  realistic  in  applica- 
tions, and  also  the  case  where  they  are  estimated  from  the  data.  His  solution  is  to 
replace  x with  the  best  linear  predictor,  x,  of  x given  X_.  This  amounts  to  the  re- 
gression calibration  method.  He  points  out  that  when  p^,  Yxx,  and  Y,uu  are  known, 
x and  u are  independent,  and  X_  is  normally  distributed 

E(x  | X)  = XA  + }^(Ik  - A),  (2-23) 

where  A = (Yxx  + Euu)_1S*x  is  the  reliability  matrix,  / is  the  k x k identity  matrix, 
and  k is  the  dimension  of  X_-  Of  course,  in  practice,  some  of  these  quantities 
may  not  be  known.  The  equality  in  2.23  does  not  hold  when  X_  is  not  normally 


30 


distributed,  but  the  right  hand  side  of  Equation  2.23  is  still  the  best  linear  (in  X_) 
predictor  of  x under  mean  square  error  [26].  So  when  these  parameters  are  known, 
Gleser  suggests  fitting  the  model  Y = h{X_ A);  fi)+e  = h(E(x  \ X_);  (3)+e. 

An  iterative  estimation  procedure  for  coefficients  of  nonlinear  ‘'functional’ 
relations  was  proposed  by  Wolter  and  Fuller  [65]  and  assumes  a known  covariance 
matrix  of  (e*,  u'),  i.e.,  Eeu,  or  minimally  its  order  is  known.  Their  estimator  is  a 
modification  of  the  maximum  likelihood  estimator  for  the  nonlinear  model  with 
normal  measurement  error.  Assuming  (ej,  u()  are  independent  normal  (0,  Eeu) 
random  variables,  the  maximum  likelihood  estimators  are  those  values  of  x and  ft 
that  minimize  the  sum  of  squares 

<?(&  & 2Q) = it,  iYi  - iYi  - ■ 

t=i  t=i 

Under  their  nonlinear  model  it  is  not  possible  to  derive  an  explicit  expression  for 
the  MLE  of  /3  but  Wolter  and  Fuller  do  develop  two  estimators  under  slightly 
different  assumptions  of  the  order  of  Eeu  and  provide  theorems  for  the  limiting 
distributions  of  these  iteratively  defined  estimators. 

Repeated  Measures 

In  Gleser’s  solution  mentioned  above,  when  and/or  A are  unknown,  consis- 
tent estimators  Ji  and  A can  be  used  as  substitutes.  Such  consistent  estimators  can 
be  obtained  from  the  data,  X1, ...,  Xn,  and  information  from  prior  calibration  stud- 
ies on  (x,  X)  [26].  Using  repeated  measures,  an  estimate  of  Euu  can  be  obtained 
as  previously  mentioned  in  the  sections  on  repeated  measures  under  the  linear  ME 
models,  and  Euu  can  then  be  used  to  estimate  A by  A = S^xiSxx  ~ £uu).  In  such 
cases,  the  model  Y = h(x]  §)  + e,  where  x = XA  + ^{Ik  - A),  should  be  fitted 
to  estimate  /3.  Gleser  [26]  also  points  out  that  a better  substitution  for  h[x_ ; P) 
may  be  obtained  from  a Taylor  series  expansion  of  h(x]  ft)  about  x.  He  proved 
consistency  and  asymptotic  distributional  properties  for  his  estimator. 


31 


Instrumental  Variables 

In  addition  to  the  model  defined  by  2.19,  2.20,  and  2.21,  assume  there  are 
observations  available  on  an  instrumental  variable,  Wi.  In  nonlinear  models,  IV  s 
must  still  satisfy  the  same  requirements  as  those  stated  in  Subsection  2.2.2  for  the 
simple  linear  ME  model.  That  is,  IV’s  must  be  correlated  with  x{,  independent 
of  the  measurement  error  in  xi:  i.e. , independent  of  Uj,  and  must  also  be  surro- 
gate, i.e. , W_j  must  be  independent  of  V;  given  x{  (which  is  equivalent  to  saying 
independent  of  model  error,  %). 

Amemiya  [5]  gives  the  IV  estimator  and  its  asymptotic  properties  for  the 
nonlinear  “functional”  errors  in  variables  model  that  is  nonlinear  in  xt.  His  model 
is  y9  = (3°)  with  no  model  error,  and  he  assumes  the  measurement 

errors,  £■  = (e*,  u'),  to  be  independently  distributed  with  zero  mean  and  unknown 
covariance  matrix  E£e.  Because  he  is  working  with  the  functional  model,  the  x are 
assumed  to  be  fixed  constants  and  therefore  have  no  distributional  assumptions. 
Amemiya  defines  the  instrumental  variable  estimator  /3/v,  of  /3°  as  the  value  of  /3  in 
the  parameter  space  that  minimizes 

n-1  E [Y,  - Mil  x {"■’EMt-  [y<  - M2C,;g)] 

i= 1 J l »=1 

which  is  the  squared  norm  of  the  sample  covariance  between  IT*  and  Y{  — 
where  the  observed  2Li  replaced  the  unobservable  x®.  The  motivation  behind  using 
Q(f3)  is  that  if  W_{  is  uncorrelated  with  measurement  errors  £u  then  Y{  - h(^0) 
is  uncorrelated  with  IT,-  when  /3  = f3°.  Therefore  to  obtain  an  estimator  of  (3° 
minimize  Q(f3).  It  is  noted  that  this  estimator  is  equivalent  to  the  nonlinear 
two-stage  least  squares  estimator.  Theorems  and  proofs  are  provided  for  the 
asymptotics  of  his  estimator.  It  is  stated  that  plimn^. <x>§_IV  — and  that 
n1/2  (dIV  - /3°)  -*L  N( 0,  T),  where  a formula  for  T,  the  covariance  matrix  of  the 
limiting  distribution  of  the  estimator,  is  given. 


32 


Not  much  research  has  been  done  on  structural  nonlinear  ME  models.  Buzas 
[9],  however,  does  assume  a structural  model  in  using  a regression  calibration 
approach  with  IV ’s  to  construct  unbiased  score  equations  in  nonlinear  measurement 
error  models.  His  method,  however,  requires  the  use  of  an  unbiased  estimate  of  the 
true,  unknown  variable. 

SIMEX  estimation  uses  knowledge  about  the  measurement  error  variance 
as  it’s  identifying  piece  of  information,  and  therefore,  this  type  of  bias  adjusting 
estimation  is  not  used  with  instrumental  variables  in  nonlinear  ME  models,  except 
perhaps  for  comparison  between  methods. 

2.4.3  Exponential  Family  Models:  Non-Identifiability  and  Bias 

In  this  class  of  ME  models  are  those  whose  response  distributions  can  be 
a function  of  the  mean.  A generalized  linear  model,  or  GLM,  is  one  type  of 
model  that  shall  be  included  in  this  section.  See  McCullagh  and  Nelder  [40]  for 
further  details  of  the  GLM.  The  observed,  dependent  variable,  Y,,  is  linked  to  the 
explanatory  variables,  as  follows: 

E{Yi  | Xi)  = m, 

where  Hi  is  called  the  systematic  component  of  Yj.  The  dependence  of  Hi  on  is 
assumed  to  be  determined  by 

</(/h)  = Vi, 

for  a monotone  specification  of  g(.),  the  link  function,  where  ry,  = x-/T  The 
distribution  of  Y. \ given  xt  may  be  in  the  general  exponential  family,  in  which  case 

fY\x{Vi  I Ai)  = exp [{yidi  - 6(0i)}/oi(0)  + c{yu  <f>)], 

where  9 is  the  canonical  parameter,  E(Yi  | xj  = b'(9i)  = Hii  Var(Y{  \ xj  = 
Oi(<t>)i/'(6i)  = Oi(<l>)V(Hi),  where  V{-)  is  the  variance  function  depending  on  Hi  only, 


33 


and  (j>,  also  denoted  as  a2,  is  called  the  scale  or  dispersion  parameter.  Commonly, 
a,i((j) ) = (f)/wi,  where  Wi  are  known,  weights. 

Stefanski  [55]  discusses  the  bias  in  estimating  ft  using  an  M-estimator.  An 
M-estimator  provides  a consistent  solution  when  solving  an  unbiased  estimating 
equation  in  the  absence  of  measurement  error.  M-estimators  are  often  used  to 
estimate  parameters  in  GLM’s.  One  lets  -0* ( - ) j * = 1>  2,  ...,  n,  be  functions  of  Yi,  x{, 
and  P,  which  has  values  in  a space  that  is  of  the  same  dimension  as  that  of  ft.  An 
M-estimator  is  a solution  to  an  estimating  equation  (which  is  unbiased  if  it  has 
expected  value  0),  which  has  the  form 


In  practice,  the  ^(.)’s  are  not  chosen,  but  rather  these  functions  are  defined 
through  the  choice  of  an  estimator,  e.g.,  maximum  likelihood  or  least  squares. 
(See,  for  example,  [13,  57]  for  more  on  M-estimators.)  Stefanski  [55]  defines  an 
M-estimator,  ft,  for  an  unknown  P where 


Here  z,  = ( y, , x[),  i — 1,2,  ...,  n,  are  independent  random  vectors  such  that 
E{ipi(zi,  Pq)}  = 0,  where  tp(., .)  and  P0  are  of  the  same  dimension.  In  Stefanski’s 
notation,  the  observable  data  consist  of  the  pairs  (y.,  X_p,  where  X,  = x ,■  + crVj, 
E{Vi  | Vi,  xp  = 0,  E(Vit/i  | yi , xt)  = Q(yi:  x{),  and  a is  a scalar.  In  order  for  this 
error  structure  to  result  in  the  usual  additive  error  structure  previously  discussed, 
i.e.,  X_i  = + u.i,  let  ^ = Uj/a.  This  error  structure  allows  for  dependence 

between  the  measurement  error  and  the  vector  (y,  x1)  and  can  accommodate 
situations  when  the  measurement  error  may  be  multiplicative  or  heteroscedastic. 
A naive  estimator  is  obtained  by  using  = (yi,  instead  of  z_{  in  Equation 
2.24.  According  to  Stefanski  [55],  this  estimator,  denoted  by  P,  converges  not  to 


n 


0 = n 1 ^ §)  ■ 


t=i 


n 


(2.24) 


34 


/3q  but  to  /3(a)  which  satisfies  limn -1  ]T"=1  P_(a))}  ~ 0-  (Note  that 

/3  = j3(o  = 0).)  He  notes  that  generally  the  value  that  §_  converges  to,  i.e.,  fi(a) 

, will  not  be  equal  to  the  true  parameter  value,  and  thus  /3,  in  most  cases,  is 
asymptotically  biased.  The  bias  can  be  assessed  by  the  relation  between  /3(cr)  and 
which  Stefanski  [55,  p.  585]  derives  from  a Maclaurin  series  expansion.  The 
resulting  relation,  for  a near  zero,  is 


— /3(cr ) + -<r2A  1 E{ipxx(y,  x,  x)}  + o(a  ), 


(2.25) 


where  A = lim  n 1 ]T"=1  E 


dip(z,§) 

9-  J l=E> 


and  ipxx  refers  to  the  second  derivative 


of  the  second  term  in  ., .)  with  respect  to  a.  Unfortunately,  Equation  2.25  is 
a function  of  the  unknown  and  the  true  x’s.  Stefanski  provides  an  estimator 
of  this  relation  which  is  his  bias  adjusting  estimator.  It  further  adjusts  the  bias 
at  each  step,  although  he  does  not  pursue  higher  order  corrections.  His  method 
requires  knowledge  or  estimability  of  the  measurement  error  variance,  as  well  as  a 
in  his  definition  of  error  structure,  and  shall  be  discussed  in  future  sections. 

2.4.4  Exponential  Family  Models:  Addition  Information  Required 
Known  Parameters 

Armstrong  [6]  provides  a method  for  estimating  (3_  when  the  measurement 
error  variance  is  known  or  estimated,  but  requires  an  estimate  of  E(x  \ X_ ).  He 
applies  the  Generalized  Linear  Interactive  Modeling,  or  GLIM,  software  package, 


used  to  perform  iteratively  reweighted  least  squares,  to  transformed  data  with 
adjusted  model  specifications.  He  provides  examples  where  the  distribution  of 
measurement  errors,  namely,  the  measurement  error  variances,  are  assumed  known 
and  states  that  the  same  method  can  be  used  when  the  measurement  error  variance 
is  estimated  from  replicated  measurements.  We  introduce  the  procedure  here  and 


mention  it  again  when  the  measurement  error  variance  is  not  assumed  to  be  known, 
but  rather  is  estimated  from  repeated  observations.  First,  the  distributional  form 


35 


of  Y given  x in  the  original  GLM  is  specified  for  Y given  X-  Next  a transformation 
t(X)  of  X is  chosen,  and  an  inverse  link  function  g*~\  possibly  different  from  the 
original  inverse  link  function  is  found  such  that 

E(Y\X)  = fj,*  = g*-1{t(X)'P), 

at  least  as  an  approximation.  The  new  g *-1  replaces  g 1 as  the  inverse  link  and 
t(X),  the  transformed  data,  is  used  as  the  explanatory  variable.  An  example  he 
provides  is  the  case  of  GLM’s  with  the  identity  link.  In  this  case  E( I ! X)  — A4  — 
gf/3  and  hence  E(Y  \ X)  — E{E(Y  \ x ) | X}  — E(Y{3~  \ X)  = {E(x  | X)}  /5. 

So  here  we  see  g*-1  = g~l  =identity  and  t(X)  = E(x  | X),  the  posterior  mean  of 
x given  X,  which  is  the  piece  used  to  replace  the  unobservable  x in  the  regression 
calibration  method.  Finally,  apply  iteratively  reweighted  least  squares  where  the 
iterative  reweighting  is  redefined  so  as  to  match  the  correct  variance,  Var(Y  \ X). 
Armstrong  suggests  iteratively  specifying  a new'  prior  weight  w*  as 

* ^2(h*) 

W ~ ((I>/w)E(t^h)  | X)  + Var(fi  | X)  ’ 

where  fi,  w,  </>,  and  r2(/ii)  = V'{Qi)  are  the  mean,  prior  weight,  scale  parameter, 
and  variance  function  defined  for  the  original  GLM  for  Y given  x,  respectively. 

He  provides  a theorem  and  proof  stating  the  above  procedure  produces  maximum 
quasi-likelihood  estimates,  henceforth  MQLE,  for  0.  Armstrong  [6,  p.  535]  also 
points  out  that  when  f(Y  \ X)  is  in  the  exponential  family  the  MQLE’s  are 
MLE’s.  Further,  he  notes  that  the  MQLE’s  and  their  estimated  variances  are 
asymptotically  unbiased. 

As  mentioned  before,  Stefanski  [55]  provides  a bias  adjusting  estimator  when 
the  error  variance  is  known  or  an  estimate  of  it  is  available.  By  the  authors 
admission,  needing  this  type  of  information  is  common,  but  “does  restrict  the 
applicability  of  the  results”  [55,  p.  584].  Having  an  initial  naive  estimate,  /J,  the 


36 


bias  adjusting  estimate  analogous  to  Equation  2.25  is 


-l 


n 


^'0xx(2/i>  Ki,  P)Q{yii  Kj) 


(2.26) 


where  a2  is  a scalar  defined  in  his  error  structure  in  Section  2.4.3,  ^ refers  to  the 
first  derivative  of  the  function  ?/>(., ., .)  with  respect  to  the  third  argument,  and 
■ipxx  refers  to  the  second  derivative  of  the  second  argument  of  V>(->  •>  •)•  He  provides 
asymptotic  results  for  the  naive  estimator  as  well  as  his  bias  adjusting  estimator 
and  provides  a small  Monte  Carlo  study  comparing  the  two  estimators. 

A recent  publication  by  Aitkin  and  Rocci  [2]  uses  the  EM  algorithm  with  the 
assumption  of  known  measurement  error  variances  to  estimate  the  parameters  in 
GLM’s  with  continuous  measurement  error  in  the  independent  variables  . 

Repeated  Measures 

When  the  measurement  error  variance,  £uu,  is  not  assumed  to  be  known  it 
can  be  estimated  from  repeated  measures.  We  have  mentioned  in  earlier  sections 
on  repeated  measures  ways  of  estimating  Armstrong  [6]  provides  a method 
for  estimating  from  the  data  that  provides  MQLE’s  when  the  measurement  error 
variance  is  estimated  from  replicated  observations. 

Stefanski  [55]  provides  a bias  adjusting  estimator  when  the  error  variance 
is  known  or  can  be  estimated.  He  performs  a Monte  Carlo  study  of  the  naive 
estimator  with  his  bias  adjusting  estimator  (given  in  Equation  2.26)  in  both  cases 
when  cr2,  a scalar  defined  in  his  error  variance  structure,  is  known  and  when  it 
is  estimated  through  replicated  measurements.  He  states  there  is  no  significant 
differences  between  the  two  procedures  and  actually  only  reports  the  results  from 
the  method  when  a2  is  estimated. 

For  binary  regression  models,  Carroll  et  al.  [14]  use  replications  to  estimate 
Yjuu  in  constructing  structural  maximum  likelihood  estimates. 


37 


In  a logistic  regression  example,  Carroll  et  al.  [13]  use  the  regression  cali- 
bration technique  with  an  estimate  of  the  error  variance  to  assess  the  effect  of 
saturated  fat  on  the  risk  of  developing  breast  cancer,  adjusted  for  other  variables. 
Saturated  fat  is  measured  via  a 24  hour  recall  of  food  intake  and  uis  measured 
with  considerable  error”  and  bias  [13,  p.  43].  They  apply  the  regression  calibration 
technique  and  compare  the  naive  estimator  ignoring  measurement  error  with  the  es- 
timator using  the  regression  calibration  method.  The  regression  calibration  method 
did  successfully  correct  for  attenuation  and  this  is  seen  in  a comparison  of  the  two 
parameter  estimates.  The  naive  estimate  of  the  saturated  fat  coefficient  was  -0.97 
and  that  from  the  regression  calibration  technique  was  -4.67. 

Carroll  et  al.  [13]  also  compare  the  naive  estimator,  the  regression  calibration 
estimator,  and  the  SIMEX  estimator  in  another  logistic  regression  problem 
where  the  error  variance  was  estimated.  In  this  study,  the  outcome  indicates 
the  occurrence  of  coronary  heart  disease  based  on  covariates  measured  without 
error  and  systolic  blood  pressure,  the  covariate  which  is  measured  with  error. 

They  provide  the  results  from  all  three  estimation  methods  as  well  as  compare 
estimated  standard  errors  of  the  estimates,  also  calculated  via  different  methods, 
e.g.,  sandwich  and  information  standard  errors.  They  found  that  the  SIMEX  and 
regression  calibration  estimators  adjust  for  the  bias  in  comparison  to  the  naive 
estimator. 

Instrumental  Variables 

Carroll  et  al.  [13]  use  an  instrumental  variable  approach  in  the  GLM  setting 
which  is  closely  related  to  regression  calibration.  They  propose  three  IV  estimators 
and,  because  these  estimators  rely  on  numerous  regression  parameters,  they 
use  the  notation  employed  by  Stefanski  and  Buzas  [58]  to  indicate  different 
parameters  from  regressing  one  variable  on  others.  This  new  notation  requires 
underlining  variables.  Therefore,  it  should  be  understood  that  in  the  current 


38 


section,  appropriate  variables  are  still  vector  valued,  but  their  underline  has  been 
removed  to  allow  for  this  new  notation.  For  example,  /3y\izx  is  the  coefficient  of  1, 
i.e.,  the  intercept,  in  the  generalized  linear  regression  of  Y on  1,  Z,  and  x,  where 
Z are  covariates  measured  without  error;  Py\lZx  IS  the  coefficient  for  Z in  the 
regression  of  Y on  1,  Z , and  x.  Notice  that  because  of  this  notation  the  underbars 
indicating  vectors  must  be  dropped.  With  this  notation,  it  is  easy  to  represent 
subsets  of  coefficient  vectors,  e.g.,  ^Y\izx  = (Py\izx,  P'y\izx )■  Tbe  f°rm  °f tbe  GLM 
used  by  Carroll  et  al.  [13]  is  given  by: 

E(Y  | Z,  x)  = b'(f3Y\izx  + Z' f3Y\izx  + x'Py\izx), 

Var(Y  \ Z,  x)  = <j>V  (Py\\zx  + Z'(3y\izx  + x'Py\izx), 

where  V(.)  is  the  variance  function  and  (j)  is  the  dispersion  parameter.  Writing  the 
model  in  this  form  means  that  Carroll  et  al.  restrict  their  methods  to  GLM’s  with 
canonical  link  functions.  This  proves  to  be  the  first  restriction  to  their  methods 
under  GLM’s.  Further,  they  use  composite  vectors  to  aid  in  the  estimation 
algorithms  where  x = (1,  Z1 , x')',  X = (1,  Z\  X ')',  and  W = (1,  Z',  W')' . So 
we  may  define  (3Y]x  = {/3y\izx,  PY\ ^ P'y\izx)'  and  then  the  basic  model  may  be 
written 

E(Y\x)  = b'(x'/3Y\x), 

Var{Y  | x)  = (j)V (x1  pY]x) . 

A key  assumption  to  their  methods,  which  is  yet  another  restriction,  is  that 
the  regression  of  x on  ( Z , W,  X)  is  approximately  linear,  i.e., 

E(x  | Z,  W,  X)  — /3x\izwx  + Z' fix\\zwx  + W1  /3x\izw_x  + X'  f3x\\zwx-  (2.27) 


39 


Further,  they  assume  that 

E(x-X\  Z,  x,W)  = 0 (2.28) 

E(Y\Z,W,X)  = E{E{Y  \Z,x)\  Z,  W,  X}.  (2.29) 

Note  that  assumptions  2.27  and  2.28  imply  that  E(x  \ Z,  W)  = E(X  \ Z , W) 
and  that  (3X ^ - f3x First,  consider  the  regression  of  Y on  W.  Then  we  have 
E(Y  | W)  = b'{E(x  | W)'pY\x}  = b'(WP~mPY\i).  Using  the  critical  assumption 

that  Px\w_  = Pz\ w’  it;  follows  that 

Py\w  = Px\wPy\Z-  (2'3°) 

For  their  first  estimator,  Carroll  at  al.  [13]  suggest  starting  with  the  multivari- 
ate regression  of  X on  W to  obtain  ^x\w-  This  is  used  to  obtain  their  estimate  of 
E(x  | W)]  the  piece  required  for  this  form  of  regression  calibration.  Then,  to  obtain 
an  estimator  of  Py\x,  use  a generalized  linear  regression  of  Y on  the  predicted  val- 
ues of  x,  i.e.,  WPzfiv-  They  denote  this  estimator  as  P^,RC\  the  first  instrumental 
variable  estimator  using  regression  calibration. 

Their  second  means  of  utilizing  the  regression  calibration  approximation  works 
directly  from  the  relation  in  Equation  2.30.  For  a fixed  nonsingular  matrix  Mi, 
they  define  = 0'~~mJxi Thus  their  second  estimator  is 

3y[s’(Ml)  = where  PY^  is  the  estimated  regression  coefficient  from 

fitting  the  generalized  model  to  the  (Y,  W)  data.  They  point  out  that  when  W and 
X are  of  the  same  dimension,  this  second  estimator  does  not  depend  on  M\  and 
thus  is  identical  to  their  first  estimator.  However,  they  note  that  when  the  number 
of  instruments  is  greater  than  the  number  of  variables  measured  with  error,  the 
choice  of  M\  matters.  Carroll  et  al.  [13,  p.  117-119]  derive  an  estimate  of  M\  that 
minimizes  the  asymptotic  variance  of  ■ 


40 


Their  final  estimator  makes  use  of  the  fact  the  X and  W are  both  surrogates. 
The  derivation  of  this  estimator  is  involved  [13,  p.  114-115],  and  will  be  omitted 
here.  In  defining  this  estimator,  let  dim(Z)  be  the  number  of  components  of  Z , 
the  covariates  measured  without  error.  Further  define  PY\wx  = @Y\l zwx  an(* 

Pym  = (01Xd,  &Y\izwx)',  where  d = 1 + *m(Z).  Then’  for  a §iven  matrix  M2’ 
their  final  IV  estimator  is  /3yg2’(M2)  = & + ^x\wPy\wx}-  Again,  they 
[13,  p.  119-120]  derived  an  estimate  of  M2  that  minimizes  the  asymptotic  variance 
°f  for  the  case  dim(W)  > dim(X). 

They  compared  these  three  IV  estimators  along  with  the  naive  estimator 
in  a logistic  regression  example  where  the  outcome  is  an  indicator  of  coronary 
heart  disease  and  systolic  blood  pressure  is  the  covariate  measured  with  error.  A 
second,  independent  blood  pressure  measurement  taken  from  a second  examination 
serves  as  their  instrumental  variable.  In  their  first  application  of  the  example, 

W and  X are  of  the  same  dimension  so  their  first  two  estimators  are  equivalent. 

In  the  second  application,  W is  taken  as  a two-dimensional  variate  from  two 
measurements  of  blood  pressure  taken  at  the  second  examination,  and  therefore  all 
three  estimates  are  computed.  In  all  cases  their  estimators  aided  in  correcting  for 
the  attenuation  when  compared  to  the  naive  estimate.  The  authors  also  provided 
asymptotic  distribution  approximations  for  their  estimators. 

For  further  references  on  the  use  of  instrumental  variables  in  the  exponential 
family,  see,  for  example,  Buzas  and  Stefanski  [10]  for  the  use  of  IV  s in  generalized 
linear  models,  also,  under  canonical  links,  Thoresen  and  Laake  [62]  for  the  use  of 
IV’s  in  a factor  analysis  approach  in  logistic  regression,  and  Buzas  and  Stefanski 
[11]  for  the  use  of  IV’s  in  a probit  ME  model. 

As  mentioned  in  previous  sections  on  instrumental  variable  estimation,  SIMEX 
estimators  are  not  used  with  instrumental  variables  since  this  simulation  based 
method  of  estimation  relies  on  knowledge  of  the  measurement  error  variance  for 


41 


estimation.  The  rare,  and  not  very  practical,  exception  would  be  to  compare  an  IV 
estimate  to  a SIMEX  estimate. 

In  the  following  chapter  we  define  a general  model  that  can  be  applied  to 
most  ME  models.  We  call  the  model  the  Generlized  Simple  Measurement  Error 
model.  In  the  remainder  of  the  dissertation  we  develop  an  estimation  method  for 
its  parameters  and  means  of  making  inferences  on  them.  Further,  we  show  how 
many  known  ME  model  problems  fit  into  the  Generalized  Simple  Measurement 
Error  model  framework  and  can  be  solved  using  its  methods. 


CHAPTER  3 

GENERALIZED  SIMPLE  M.E.  MODEL  WITH  INSTRUMENTAL  VARIABLES 

3.1  The  Model 


We  consider  a general  IV  solution  to  nonlinear  measurement  error  problems 
assuming  no  error  free  covariates  and  at  least  one  instrumental  variable.  We 
consider  the  case  of  p manifest,  or  observable,  random  variables  that  are  related  to 
a single  x,  the  unobservable  truth.  To  allow  full  generality  of  our  model  we  must 
allow  for  both  categorical  and  non-categorical  manifest  variables  (outcome  variable, 
observed  value  of  x,  and  IV’s).  We  start  by  looking  at  the  outcome  manifest 
variable(s).  Although  the  model  will  include  a total  of  p manifest  variables,  we 
start  by  discussing  the  outcome(s)  of  interest,  which  will  be  denoted  with  subscript 
r*.  So  without  loss  of  generality,  consider  the  outcome  variable (s),  Vr.,  to  be  any 
one  of  the  p manifest  variables,  which  are  denoted  Yr,  r = 1,  2,  ...,  p.  We  let  Vr. 
given  x be  a pT> -dimensional  column  vector  representing  the  primary  outcome 
variables  of  interest  (previously  denoted  by  a univariate  Y in  Chapter  2).  Assume 
the  distribution  of  Y_r.  is  known  and  for  the  ith  individual,  i = 1,2,  ...,  n,  we 
observe  Yir. . Let  6r.  = (0rn,  ^r*2,  0T'kr.)'  € ©r*  be  afcr.  xl  vector  of 

parameters  defining  the  conditional  distribution  of  Y_r. . We  shall  suppress  the 
notation  for  individuals,  i.e. , i,  for  the  remainder  of  this  dissertation,  except  in 
instances  where  it  may  be  required. 

Note,  for  example,  that  9_r.  might  be  the  canonical  parameter  if  Vr,  is  a scalar 
random  variables  with  a distribution  from  the  single  parameter  exponential  family, 
or  it  might  represent  a vector  of  conditional  distribution  parameters  in  the  case  of, 
say,  a multivariate  regression  for  Y_r , , in  which  case  Y_r.  = (Vr* i>  Yr* 2,  •••,  )', 


42 


43 


where  these  variables  are  conditionally  dependent.  Further,  9r.  could  be  a vector  of 
multinomial  cell  probabilities  in  the  case  of  a categorical  response. 

For  the  outcome  vector,  Y_r, , assume  there  exists  a kr-  x 1 vector  of  parameters 
of  interest  that  we  are  interested  in  modeling,  which  is  denoted  by  7^,  G T,.. . To  do 
so,  we  specify  the  models 


or,  in  vector  notation  7 , = hr.(x]  /3  ),  where  is  the  /r*  x 1 vector  denoting  the 
concatenation  of  the  distinct  elements  of  f3  , /?  , ...,  (3  . This  is  the  vector 

— T 1 — V A — T Kr  * 

of  model  parameters,  i.e.,  those  which  we  intend  to  estimate,  relating  Yr,  to  x. 
Assume  that  the  elements  of 


are  differentiable  with  respect  to  the  elements  of  /?  . 

Further  assume  that,  in  terms  of  the  original  kr*  parameters  of  the  conditional 
distribution  of  Y_r. 


7r*l  = hr.  1(1;  3t.x) 
7r*2  = K* 2{X-  §_r,2) 


hr*kT*  (xi  §_r*kr*  )’ 


(3.1) 


7r*i  = Tr.  i(9r.) 


7r*2  — Tj-*2  {jLr* ) 


' Yr‘kr » — TT-kr.  (£r*)i 


(3.2) 


and  that  the  vector  of  functions  (Tr*i(0r„),  Tr.2(^r.)i  •••)  Tr.fcr.  (0r..)),  define  a one- 
to-one  mapping  of  0r.  onto  rr..  We  can  express  Equation  3.2  in  vector  notation 


44 


as  7 = T_t,{6t.).  Since  these  functions  together  define  a one-to-one  and  onto 

mapping,  there  exists  an  inverse  mapping  defined  by  a vector  of  functions  and 
denoted  = (^(7^1),  T~.\{ 7r.2),  T“ir. (lvkr.))'-  Given  the  model 

assumptions  in  Equation  3.1  we  may  write  6r . as 


= Tr.\(hr.(X\  Pr.)) 

9,-2  = Tr-4(Ar.(x;^.)) 

Or-K-  = Tr-i.  (M*l  &.)),  (3-3) 

Note  that  the  Tp*(.)  functions,  j = 1,  2,  ...,  fcr.,  are  not  inverse  functions,  but 
together  define  the  inverse  mapping  from  Tr.  onto  0r* . The  elements  of  the  vector 
T~}(hr,(x',  fi  .))  are  assumed  to  be  differentiable  with  respect  to  the  elements  of 
hr»{x\  §Jr.)- 

These  mappings  will  allow  us  to  substitute  the  parameters  of  the  conditional 
distribution  with  compound  functions  of  the  model  parameters  in  the  estimation 
procedure  described  in  Chapter  4,  thus  allowing  us  to  estimate  /3  . In  vector 
notation  we  write  the  mapping  defined  by  Equation  3.3  as  9_r.  — T_~} 

This  relationship  between  conditional  distribution  parameters  and  model  parame- 
ters for  a model  may  be  used  to  define  all  the  models  previously  discussed  in  this 
dissertation  as  well  as  those  that  arise  in  several  other  important  applications. 

We  shall  see  later  in  this  chapter  how  many  known  models  can  be  defined  by  the 
mappings  given  in  Equation  3.3. 

Our  model  includes  a total  of  p manifest  vector  variables  which  are  related  to 
a single,  independent  variable  that  is  measured  with  error.  In  the  current  chapter, 
we  shall  not  include  covariates  measured  without  error,  i.e. , other  variables  related 
to  the  p manifest  vectors  which  are  not  contaminated  by  measurement  error.  Error 
free  covariates  have  previously  been  denoted  by  Z.  Note  that  adding  covariates 


45 


measured  without  error  may  be  an  important  extension  of  our  model.  In  many 
applications,  it  may  be  necessary  to  control  for  additional  error-free  variables.  A 
discussion  of  the  possibility  of  future  research  to  include  such  covariates,  as  well  as 
an  explanation  of  how  they  potentially  could  be  handled  in  the  model,  are  given  in 
Chapter  7,  where  the  summary  discussion  of  this  dissertation  and  future  work  are 
discussed. 

The  manifest  variables  shall,  as  mentioned  at  the  beginning  of  this  chapter, 
be  denoted  by  the  pr-dimensional  column  vectors  Y_r,  r = 1,2,  ...,  p , where  the 
outcome  of  interest  is  one  of  the  p vectors,  i.e.,  r*  € {1,  2,  ...,  p}.  Let  x,  the 
unobservable  truth,  be  continuous  with  a known  density  denoted  by  f(x  | The 
dimension  of  the  column  vector  6X  is  kx. 

For  illustrative  purposes,  let  us  consider  a generic  application  with  three, 
one-dimensional  manifest  vector  variables.  One  may  wish  to  denote  the  outcome 
of  interest,  Y_r „,  as  Yi,  say.  Further,  one  may  denote  Y2  as  the  measurement  of 
the  unobservable  x (X  in  the  notation  defined  in  Chapter  2)  which  is  usually 
considered  to  be  an  unbiased  measurement,  in  which  case  Y2  = x + u with 
E(u)  = 0,  but  not  necessarily.  The  third  observable  variable,  Y3,  may  denote 
the  instrumental  variable  ( W in  the  notation  defined  in  Chapter  2)  providing  the 
identifying  information.  Recall  from  Chapter  2 that  this  is  the  variable  providing 
additional  information  to  identify  the  model.  Parameters  of  the  above  model  can 
be  estimated  using  these  two  variables,  Y2  and  Y3,  in  addition  to  the  outcome  of 
interest,  Y\. 

In  other  possible  applications  with  more  than  three  manifest  variables,  one 
may  think  of  the  extra  Y_r  as  additional  instrumental  variables.  We  will  show  that 
the  model  parameters  in  the  models  defined  for  the  p manifest  vector  variables, 

Y_x,  Y 2,  ...,  Y , may  be  estimated  utilizing  their  relations  to  x,  whose  mappings  are 
described  in  vector  notation  as  T_r{6_r)  = 7r  = kr(x\  /3  ),  r = 1,2,  ...,  p. 


46 


A key  assumption  to  the  methodology  that  follows  is  conditional  independence 
of  the  Y_r  given  x,  r - 1,  2,  p.  Manifest  variables  which  are  not  conditionally 
independent  are  grouped  into  manifest  vectors  that,  together,  satisfy  the  condi- 
tional independence  assumption  given  x,  when  compared  to  the  remaining  manifest 
variables.  This  enables  the  GSME  model  to  handle  violations  of  the  conditional 
independence  assumption  and  to  make  full  use  of  the  information  provided  in  the 
nondiagonal,  variance-covariance  matrix  of  those  variables.  We  discuss  this  further 
in  Chapter  4. 

Since  every  manifest  variable  may  be  defined  using  the  mappings  similar  to 
those  for  Y_r.  in  Equation  3.3,  we  may  further  let,  §_r  be  the  lr  distinct  parameters 
that  describe  the  relation  between  Yr  and  x,  for  each  r.  Also,  since  each  Y_r  has  a 
known  conditional  distribution,  each  variable  has  kr  distributional  parameters  de- 
noted by  a vector  0_T.  The  relationship  between  these  9r  and  the  model  parameters, 
when  solved  for  the  distributional  parameters,  may  be  written  in  vector  notation  as 

eT  = T;1  (hr(x-,  £.))  , r = 1,  2,  ...,  p.  (3.4) 

Assume  that  the  mappings  T~l  (hr{x\  and  hr{x ; £.)  have  the  same  properties 
as  those  stated  above  for  Y_r, . 

In  the  next  chapter,  we  will  see  why  it  is  convenient  to  be  able  to  write  the 
relation  between  all  distributional  parameters  and  all  the  model  parameters  as  one 
vector.  Concatenation  of  all  of  the  column  vectors  of  mappings,  i.e.,  Equation  3.4 
for  all  r,  for  the  conditional  distributional  parameters  of  all  the  manifest  variables 
and  for  those  of  x shall  be  written  in  one  (ki  + k2  + ...  + kp  + kx)  x 1 vector  as 

6 = ZT1  (h{x\  §))  , (3-5) 

where  the  {h  + l2  + ■■■  + lP  + kx)  x 1 vector  of  model  parameters  is  denoted  by 


47 


This  vector  of  parameters,  (3,  shall  be  referred  to  as  the  “model  parameters”  since 
these  are  the  parameters  we  intend  to  estimate , not  to  be  confused  with  the  vectors 
7 whose  elements  are  the  parameters  we  are  interested  in  modeling , for  each  Y_r. 

— -7* 

Further,  the  last  kx  elements  in  9 are  the  distributional  parameters  for  x,  which 
require  no  mappings,  and  therefore  are  simply  equal  to  the  last  kx  elements  in  /3, 
which  are  of  course  9^..  The  mappings  of  T-1  (/j(x;  (3))  and  h(x\  (3)  will  be  written 
in  short  hand  notation  as  T-1  and  h,  respectively,  whenever  possible.  We  may  also 
similarly  abbreviate  the  mappings  T^1  (jdr{x;  and  hr(x\  Qr),  respectively,  as 
T”1  and  hr.  Since  the  column  vector  T_1  is  simply  the  concatenation  of  the  T“\ 
for  all  r,  the  entire  concatenated  vector,  then,  has  the  same  properties  as  those 
stated  for  the  mappings  of  the  individual  manifest  variables.  The  same  is  obviously 
also  true  for  h. 

This  model  shall  be  called  the  generalized  simple  measurement  error  (hence- 
forth, GSME)  model ; simple  because  there  is  one  covariate  measured  with  error. 

By  definition  of  a ME  model,  the  GSME  model  therefore  requires  p > 2.  This 
is  to  allow  for  an  outcome  of  interest  which  is  related  to  x and  a measurement  of 
x which  is  contaminated  with  error.  The  identifying  information  for  our  GSME 
model  comes  through  the  addition  of  at  least  one  instrumental  variable.  In  Chapter 
5 we  will  show  how  our  motivating  example  based  on  problems  faced  by  the  RERF 
on  estimated  radiation  dose,  discussed  in  Chapter  1,  fits  into  the  GSME  model 
framework.  In  the  next  section  of  this  chapter,  we  show  by  example  how  many  well 
studied  ME  models  can  be  defined  using  a GSME  model. 

3.2  Applications  of  the  GSME  Model 

In  this  section,  we  will  show  how  examples  previously  discussed  in  this 
dissertation,  and  some  others  of  interest,  fit  into  the  GSME  model  setup.  The 
examples  considered  describe  scenarios  for  Y_r.,  the  outcome  of  interest,  however, 
they  are  possible  examples  for  any  of  the  Y_T.  A strength  of  the  GSME  model 


48 


is  the  numerous  and  varied  possibilities  for  its  application.  One  GSME  model 
could  contain  any  combination  of  the  examples  presented  below  for  its  p manifest 
variables. 

3.2.1  Exponential  Family  General  Linear  Models 

One  way  a variable  in  the  general  exponential  family  may  be  modeled  is 
through  the  well  known  Generalized  Linear  Model.  Consider  a GLM  for  pT - = 1 
and  Yr - in  the  single  parameter,  exponential  family.  We  assume  that  f(yT*  | z;  9r.) 
is  known,  where  9r.  = ( 9r .,  0r- )',  9r . is  the  canonical  parameter,  and  <f)r.  is  the 
dispersion  parameter  which,  here,  is  assumed  to  be  known.  As  we  have  seen  in 
Subsection  2.4.3  of  Chapter  2, 

/(|/r*  | X-  9t.)  = exp[{?/r.0r.  -6r*(^rO}/ar*(</»r*)  + Cr*(2/r*,</>r-)]- 

In  this  case,  we  know  that  we  are  modeling  E(yr * | x\  9r.)  = /v(»?r*)>  where 
r]r.  = /3r-0  + x/3r*  i and  /rr-(r/r.)  is  specified  through  the  choice  of  the  link  function. 
The  parameters  from  Equation  3.1,  in  terms  of  the  elements  of  hr.,  which  we  are 
interested  in  modeling,  are 

7r*i  = hr. i(rjr.)  = E(Yr.  | x)  = pr'(Vv) 

7r*2  = hr-2(x]  f3T' 2)  = /3r*2, 

where  ^r.2  is  a constant  which  does  not  depend  on  x.  Therefore  the  vector  of 
model  parameters  is  j3  = (/3r*o>  Pr*u  Pr*i)' ■ 

The  corresponding  Tr.(0r.)  mappings  from  Equation  3.2  are 

7m  = Trn(9r.)  = b'r.(9r.) 

7r*2  — Tr>2  (0r.)  = 0r* 

Therefore,  9r . = 6^1(/xr.(r/r.))  and  this  is  the  relation  between  the  conditional 
distribution  parameter,  9.r .,  and  the  model  parameters  /3t-q  and  /3r*i,  that  will  be 


49 

utilized  in  the  estimation  procedure  which  is  discussed  in  Chapter  4.  Further,  we 
have  <f>r.  = /3r.2.  Thus  we  have  established  the  mappings  which  were  defined  in 
Equation  3.3. 

Note  that  there  are  no  restrictions  on  the  nonlinear  mean  function  for  variables 
in  the  general  exponential  family.  For  further  details  on  the  exponential  family 
modeled  via  GLM’s,  see  Subsection  2.4.3,  where  the  model  was  first  introduced,  or, 
for  example,  McCullagh  and  Nelder  [40]. 

3.2.2  Multinomial  Model 

There  are  many  ME  modeling  situations,  e.g.  logistic  and  probit  regression 
models,  which  require  application  to  categorical  outcomes.  Therefore  the  ability  to 
be  applied  to  categorical  data  is  of  paramount  importance  for  the  GSME  model. 
The  outcome  of  interest  in  this  example  is  categorical,  having  Cr - categories.  Here 
Yr.  has  known  conditional  cell  probabilities  given  by  P(Yr » = jr>  \ x,  6r.)  = 6r-s, 
jr . = 1,  2,  ...,  CT> , s — 1,  2,  ...,  kT • = CT * - 1,  and  we  will  write  the  9r*s  as  7rr.s  since 
they  represent  conditional  “probabilities” . There  are  only  required  to  be  a total  of 
kT.  = CT-  - 1 distributional  parameters  since  7rr-cr.  = 1 — ]T!s=i  7rr*s- 
Modeling  the  Means 

If  we  are  interested  in  modeling  the  mean,  in  Equation  3.1  we  have 


7r*l 

7r'2 

= 

hr-2{x]  Pr,2) 

= 

Vr'lix-,  /?r,2) 

7r'fcr. 

Hr"kr.*(x,  P_T.^  > ) 

50 


where  one  must  specify  the  functional  form  of  the  /vs(x;  Pr,s)  functions,  s — 
1,  2,  kT-.  Further,  for  the  relations  defined  in  Equation  3.2  we  have  that 


7r*l 

1 

1 

7Tr»  i 

7r*2 

= 

%.  Mr-) 

= 

TTr*  2 

7r*fcr* 

^r*fcr* 

Therefore  the  T~}  mapping  in  Equation  3.3  is  the  identity  mapping.  So,  for  exam- 
ple, if  we  modeled  the  mean  as  a logistic  function  we  would  have  the  conditional 
distributional  parameters  written  in  terms  of  the  model  parameters  as 

exp  (Pr'Os  + xPr-ls) 

iv,  - a..,)  - ! + exp  (/W + ■ 

5 = 1,  2,  ...,  fcr.,  where  = (/3r.0s,  Pr-u)'  and  /3r.0s  and  fir.u  are  the  intercept 
and  slope  parameters,  respectively,  relating  the  conditional  mean  of  Y_r.  to  x for 
the  sth  conditional  cell  probability. 

Modeling  the  Generalized  Odds 

Another  example  when  the  primary  outcome  of  interest  is  a multinomial 
response  could  be  to  model  the  generalized  odds  as  the  functions  of  interest.  In  this 
case,  we  have  the  functions  in  Equation  3.1 


7r*  1 

K- 1(®;  Pr.x) 

GOr- i(x;  prn) 

7r*2 

= 

hT-2(x;  pr,2) 

- 

GOr*2(x-  Pr,2) 

1 

* 

1 

hr'kr.  ( x , §_T,k^ ) 

_ GOr*kr.{x;  pr,KJ  _ 

where  GOr>s(x]  /3  ) is  defined  to  be  the  probability  of  category  s divided  by  the 

probability  of  the  reference  category,  i.e. , category  Cr>,  say,  where  the  probabilities 
are  written  in  terms  of  x and  /3  , e.g.,  a logistic  function  of  the  linear  predictor. 


51 

The  mappings  of  the  conditional  distributional  parameters,  which  in  this  case  are 
conditional  cell  probabilities,  for  use  in  Equation  3.2,  are: 


7r*l 

1 

* 

i 

1 

* 

* 

o 

1 

7r*2 

= 

Tr.2(0r.) 

= 

7Tr*2 /^r-Cr, 

7r*fcr* 

_ Tr"kr.  (#r*) 

^T'kr>  / ^ t* C T * 

where  recall  that  kr . = Cr . — 1.  Solving  for  the  parameters  of  the  distribution  in 
terms  of  x and  (3^  to  define  the  mappings  in  Equation  3.3  results  in 

7Tr*i  = 'Kr*cr.GC)T*\(x,  P_r,^) 

TTr* 2 = 'nT-Cr.GOr>2{x-,  ^r,2) 

^r*fcr«  — '^r,CT»GOT»kr,(x)  (3_r,p,  < ) 

— 'KT*Cr*GOT*cr,[x, 

where  GOT-c  ,{x\  P ) — 1.  Since  we  know  that  the  sum  over  the  left  hand  side 
of  the  previous  equation  is  one,  we  have  that  n T-cr.  + nT'Cr.  GOr>s(x\  p ) — 
1.  This  implies  that 

1 

*r'C"  ~ 1 + Ek'.  GOr..(x; 

and  that,  for  all  s, 

_ GOr-s{x ; pT,g) 

i + zk/:1Gor.s(x]pr,sy 

which  defined  the  T~}  mappings  in  Equation  3.3. 

3.2.3  Survival  Time 

In  survival  analysis  problems,  also  known  as  the  analysis  of  failure  time  data 
[33],  the  outcome,  Yr»,  is  the  time  until  an  “event”  of  interest  occurs  and  we 
model  Yr.  = T = survival  time.  It  is  standard  to  let  T represent  the  outcome 


52 


in  survival  analysis  problems,  however,  for  consistency  of  this  dissertation,  we 
retain  our  notation,  Yr*.  As  mentioned  in  Chapter  2,  when  we  first  introduced  the 
concept  of  regression  calibration,  Prentice  [44]  modeled  the  hazard  function  in  a 
survival  analysis,  measurement  error  problem.  Therefore,  we  know  survival  analysis 
problems  are  an  important  application  of  our  GSME  model. 

For  the  purposes  of  this  example,  suppose  that  Yr»  is  the  time  until  the 
diagnosis  of  leukemia  for  Atomic-bomb  survivors  (see  Chapter  1).  The  model  of 
interest  is  the  hazard  function  of  Yr-  as  a function  of  x,  true  radiation  dose.  A very 
general  definition  of  a hazard  function  is  a function  which  describes  the  chances 
that  an  individual  in  the  study  observes  an  “event”  in  the  next  instant  of  time  (see, 
for  example,  [35,  33]).  A complication  arises  in  this  example  due  to  the  fact  that 
an  “event”,  or  diagnosis  of  leukemia,  may  be  the  result  of  background  causes  or  of 
radiation  dose,  x.  In  other  words,  there  will  be  a background  rate  of  occurrences 
that  is  independent  of  exposure,  x.  To  incorporate  this  reality  into  our  modeling 
framework,  we  write  the  hazard  function  as  the  sum  of  two  different  hazards;  one 
a function  of  the  unknown  x and  the  other  a background  hazard  independent  of  x. 
We  show,  in  detail,  in  Chapter  5 how  this  motivating  example  can  be  set  up  to  fit 
into  the  GSME  model  framework. 

So,  one  drawback  of  the  GSME  model  is  that  it  is  not  generally  applicable  to 
all  survival  analysis  problems.  The  survival  analysis  problems  that  may  fit  into  the 
framework  of  the  GSME  model,  would  most  likely  need  to  be  modified  or  adjusted 
in  some  manner  in  order  to  do  so.  It  would  seem  that  survival  analysis  problems 
would  need  to  be  handled  on  a case  by  case  basis,  with  special  care  in  making  sure 
to  meet  the  requirements  of  the  GSME  model. 


53 


3.2.4  Multivariate  Regression  Model 

For  multivariate  nonlinear  regression  we  model  a pr> -dimensional,  multivariate 
response,  Y_r. . The  model  is: 


Yr-l 

Yr- 2 

vMx]  Pt.2) 

1 

1 

1 

Pr-pr,  (xj  ) 

where  pr.s(.)  is  an  arbitrary  nonlinear  mean  function  for  the  sth  variable, 
s — 1,  2,  pr.,  and  we  assume  E(er.)  = 0 and  Var(er.)  = Eer. . Condi- 
tional dependence  in  the  multivariate  regression  model  means  that  Eer,  is  not 
a diagonal  matrix.  The  elements  on  the  diagonal  are  the  variances  of  the  Y_r.s, 
s = 1,  2,  ...,  pr.,  and  the  *jth  element  is  the  covariance  between  Yr-i  and  Yr*j , for 

In  multivariate  linear  regression  models,  the  //r.s(.)  are  linear  functions.  The 
linear  model  in  matrix  form  is 


Yr.  — Br«2£  + er„ , 

where  Y_r.  is  apr.  x 1 vector,  Br.  is  apr.  x 2 matrix,  and  X_  — (1,  x)'  is  a 2 x 1 
vector.  The  sth  equation  would  be  Pr*s  = /Vos  + xPr-is  + er*s,  s = 1,  2,  ...,  pr*. 
Assume  that  fr*{yr , | x\  6r,)  is  known  and  recall  that  the  dimension  of  9r.  is  Ay.  x 1 
and  it  contains  the  parameters  fi  (i.e.,  the  means)  and  the  distinct  elements  of  the 
variance-covariance  matrix  £6r, , say  aab,  for  all  a = 1,  2,  ...,  pr.,  b — 1,  2,  ...,  pr., 
and  a < b,  where  aab  is  a variance  when  a = b,  and  a covariance  when  not.  For 
the  linear  model,  the  functions  of  x and  f3  that  we  wish  to  model,  in  terms  of  the 
functions  for  Equation  3.1,  are 


54 


7r*l 

1 

55- 

if 

l 

fir*  01  + xfir*  11 

Ir*  2 

hr* 2(®;  fi_r,2) 

fir*  02  + xfir*  12 

7r  *Pr» 

= 

hr*pr.  (x;  fi_r,p^ ) 

= 

fir*0pr » T xfiT*\Pr, 

7r*Pr*+l 

hr*Pr.+ i{x;  Pr.Prm+1) 

fir*pT » +1 

7 r*fcr» 

hr*kT . (x,  ) 

fir*kT . 

where  the  fir*Pr.+i,  fir*Pr.+  2,  fir*kr*  are  constants  and  kr>  is  a number  such  to 

allow  the  dimension  of  the  sub-vector  (7r*Pr.+i  7r*Pr.+2  • ••,  7r*fcr* )'  to  be  equal  the 
number  of  distinct  variances  and  covariances  crab,  a = 1,  2,  pr»,  b = 1,  2,  pr 
a < b.  The  corresponding  mappings  of  the  distributional  parameters  in  Equation 
3.2  are 


0V*1 

1 

■* 

1 

Hr*l 

7r*2 

Tr.2(^.) 

Hr*  2 

7r  *Pr» 

= 

Tr*pr.  (#r* ) 

= 

l^r*prt 

7r*Pr.+l 

Tr*Pr,+i(£r.) 

ar*pr-+i 

7r*fcr. 

&r*kr. 

In  writing  these  mappings  of  the  conditional  distribution  parameters  6r . , it  is  easily 
seen  why  the  dimension  of  the  sub-vector  of  7r*’s,  (7r*Pr,+i  7r*Pr.+2  ■ 7r *kr.)', 
must  contain  enough  elements  to  allow  all  the  distinct  elements  of  the  variance- 
covariance  matrix  Efir,  to  be  mapped,  along  with  the  means,  into  the  parameter 
space  Tr.. 

One  important  attribute  of  this  application  of  the  GSME  model  is  that  per- 
forming a multivariate  regression  (linear  or  nonlinear)  may  be  a potential  solution 


55 


to  the  problem  which  can  arise  concerning  the  assumption  of  conditional  indepen- 
dence of  the  manifest  variables.  If  the  assumption  of  conditional  independence  is 
violated  or  in  serious  question,  then  grouping  those  variables  that  are  suspected 
to  be  conditionally  dependent  together  and  defining  them  as  one  manifest  vector 
variable,  Yr,  that  requires  a multivariate  regression  model,  makes  use  of  the  extra 
information  contained  in  the  covariances,  without  violating  of  the  key  assumption 
of  conditional  independence. 

3.2.5  Nonlinear  Model 

The  full  generality  of  the  GSME  model  allows  for  models  with  nonlinear  mean 
functions  which  are  not  necessarily  functions  of  a linear  relationship  between  x and 
the  parameters  which  describe  the  relation  between  x and  the  outcome  of  interest. 
This  is  an  improvement  over  generalized  linear  ME  models,  discussed,  for  example, 
in  Carroll  et  al.  [13],  whose  mean  functions  are  in  terms  of  a strictly  linear  relation 
between  x and  the  model  parameters.  For  the  outcome  of  interest,  the  nonlinear 
model  is  given  by 


Yr.  = hr*(x ; /?  ) + er., 

where  hr. (.)  is  an  arbitrary  nonlinear  mean  function  and  E(er>)  = 0.  This  model 

,v  o A ...  r ™ n n ;n  ^ 4.  r „ 

ME  model  as  nonlinear  when  6 enters  the  mean  function  in  a nonlinear  manner 

— r* 

or  when  the  mean  function  is  nonlinear  in  x.  Nonlinear  models  describe  a large 
class  of  models,  which  may  be  applied  to  both  continuous  and  discrete  outcomes. 
The  focus  of  Section  2.4  was  initially  on  continuous  outcomes  under  the  assumption 
of  normality,  before  moving  on  to  outcomes  in  the  exponential  family  modeled 
through  GLM’s. 


56 


Let  us  first  consider  a simple  nonlinear  model,  assuming  normality,  given  as 
Yr * - /3r*o  + x/3r*i  + x2  f3r* 2 + £r*  i 


where  E(er-)  = E(xer»)  = E(x2eT>)  = 0.  This  model  was  discussed  in  Section 
2.4,  where  it  was  shown  that  a bias  in  the  naive  estimators  for  nonlinear  models 
exists,  and  that  the  bias  in  the  estimator  for  the  /3r.2  parameter  is  even  greater 
than  that  for  the  estimator  of  Thus,  the  ability  of  the  GSME  model  to  be 
applied  to  this  scenario,  and  reduce  the  problems  of  the  bias  through  its  methods, 
is  of  notable  importance.  Assuming  a constant  variance  for  the  conditional  normal 
distribution  of  Fr.,  the  parameters  which  are  modeled  in  terms  of  the  mappings 
given  by  Equation  3.1  are 


7r*l 

hr‘l(x\  §_T.X) 

Pr‘0  + xPr- 1 + X2f3r*  2 

7r*2 

hr-2{x]  Pt.2) 

fif  3 

where  /3  — (/3r- o,  Pr*u  A-* 2)'  and  /3  denotes  a 1 x 1 vector  consisting  of  the 

scalar  /5r«3,  which  is  a constant.  Under  the  current  assumptions,  the  distribution  of 
Yr . given  x is  normal  with  conditional  parameters,  the  mean,  /xr.,  and  the  constant 
variance,  ar»r*.  Therefore,  9r»  = (/xr. , ar>r»)'.  The  mappings  defined  in  Equation 
3.2,  which  correspond  to  those  given  above  are 


7r*l 

Tr»l{0r.) 

Mr* 

7r*2 

TM&.) 

(J^  tp*  ip* 

As  alluded  to  in  Section  3.1  of  this  chapter,  this  permits  us  to  write  the  conditional 
distribution,  given  here  by: 


f(Vr * I x;  6r.) 


1 f ( Ur * A7*)  1 

,0  exP  1 7T \ , 

y ^ £(J jf*  f*  J 


57 


in  terms  of  the  model  parameters,  i.e.,  those  we  ultimately  intend  to  estimate,  as: 


f(Vr-  | ar;  Pr.) 


exp 


(: Vr * — (0r*o  + x(3r*l  + 2:2/5r*2)) 


V*3 


2/3, 


V*3 


where  B , which  equals  (/3r.0)  /3r*i,  /3r*2,  /3r*3V,  written  on  the  left  hand  side  of  the 
preceding  equation  should  be  understood  to  be  a symbolic  representation  of  the 
mappings  T~}  ^ hr.{x\  Since  the  density  is  a function  of  /3r>,  this  will  allow 

us  to  estimate  the  model  parameters,  and  the  process  of  how  this  estimation  is 
performed,  is  shown  in  the  next  chapter. 

Let  us  now  look  at  how  these  mappings  are  defined  for  another  common 
density,  where  it  will  not  be  necessary  to  specify  a model  to  do  so.  Assume  that  Yr* 
follows  a Gamma  distribution.  One  common  way  to  model  a Gamma  distribution 
is  through  a GLM,  however,  the  mean  of  a Gamma  distribution  is  not  required  to 
be  a nonlinear  function  of  a linear  predictor.  The  mean  could  be  modeled  by  some 
arbitrary  nonlinear  function,  which  is  the  reason  for  including  this  example  in  the 
current  subsection  on  nonlinear  models.  The  “standard  form”  of  the  density  for  a 
Gamma  distribution,  with  parameters  a and  A,  i.e.,  Gamma(a,  A),  is 

(?/r*  )Q— 1 (A)“  exp(— Ayr.) 


/(z/r*  | x;  6t.) 


F(a) 


where  R . = (a.  A1  and  for  notations]  convenience,  in  this  example  onlv.  we 
drop  the  subscript  of  r*  on  the  specific  distributional  parameters.  We  retain 
this  notation  only  for  the  random  variable  itself,  yr .,  and  the  vector  denoting  its 
conditional  distribution  parameters,  0r..  In  this  “standard  form”,  we  have  that 
E(yr •)  = a/X  and  Var(yr>)  = a/ A2.  In  order  to  show  how  the  above  model  fits 
into  the  GSME  model  framework,  we  reparameterize  the  conditional  distribution 
so  that  it  is  in  the  form  for  the  exponential  family.  The  reparameterization 
y — v/\  and  is  = a results  in  E(yr-)  — y and  Var(yT* ) = y2/is.  Often  one 
may  not  work  with  the  Gamma  density  in  its  “standard  form”  given  above.  If 


58 


the  preference  is  to  write  the  density  in  the  form  of  the  exponential  family,  the 
previous  reparameterization  would  not  be  needed  and  one  could  directly  proceed 
with  defining  the  mappings  for  hr.  and  Tr,,  for  the  density  in  the  preferred  form. 
Continuing  in  the  present  context,  the  density  function  becomes 

(: Vr'Y~l  {v/vY  exp  (~(v/p)yr.) 


f{y . 


r* 


X]  o*) 


exp 


r» 

{y»- (-V/*)  + iog(-i//i)} 

\/u 


-\-v  log  v {v  1)  logyr*  -logr(i/)], 


where  9_f.  represents  the  conditional  distribution  parameters  of  the  density  in  this 
form  (the  superscript  indicates  a different  set  of  conditional  parameters  than 
those  in  9_r.  for  the  “standard”  density)  with  elements  which  are  the  canonical 
parameter,  9 = — l//x,  and  0 = 1/v.  In  the  notation  for  the  exponential  family 
we  have  that  a(0)  = 0,  b(9 ) = — log(— 9),  and  c(yr> , </>)  = (l/<j>)  log(l/0)  + 

((1  /(f))  — l)log2/r.  — logr(l/0).  Further,  we  know  that  E(yr>)  = b'{9 ) = —1/9  = /i 
and  Var(yr .)  = a{4>)b"(9)  = 0(1  /92)  = (l/i y)/j2.  In  this  parameterization,  we  have 
as  our  hr.(x\  (3^)  functions  from  Equation  3.1: 


7r*l 

1 

* 

'tl.l 

is 

* 

t- 

1 

y(x;  pr,x) 

7r*2 

hr* 2(x;  §r,n) 

V 

where  v — (3r*2  is  a constant  scalar  and  /i(.)  is  the  nonlinear  mean  function  written 
in  terms  of  x and  /5r>1,  which,  again,  is  not  required  to  be  written  as  a function  of 
a linear  predictor.  In  practice,  one  specifies  a form  for  the  mean  function.  For  the 
Tr.  mappings  defined  in  Equation  3.2,  we  have 


7r*l 

TM9*) 

m 

7r*2 

Tr.2(9f.) 

1/0 

59 


where,  again,  b'(9)  = —1/6.  Using  these  mappings  we  can  solve  for  the  elements  of 
0*.  = («,  giving  9 = — l//i(x;  ) and  0 = 1/u,  which  is  a constant.  Again, 

if  it  is  not  desired  to  use  the  “standard”  density  for  a Gamma  distribution,  the 
definition  of  the  model  mappings  stops  here.  Otherwise  if  there  is  a preference 
to  work  with  its  “standard”  density,  we  reparameterize  back  to  the  parameters 
of  the  “standard  form”  of  the  density.  In  terms  of  the  parameters  from  the 
original  density,  6r.  = (a,  A),  for  Equation  3.3,  i.e.,  the  elements  of  the  mapping 
9r,  = T”.1  (hr,(x;  we  have  A = a///(x;  /?  ) and  a = /3r*2,  a constant. 


3.3  Model  Identification 

As  first  mentioned  in  Chapter  2,  an  important  question  that  arises  with  the 
study  of  ME  models  is  the  identification  of  a model’s  parameters.  Again,  if  the 
parameters  of  a model  are  identified,  we  may  also  say  the  model  itself  is  identified. 
To  consider  the  identification  of  the  GSME  model,  we  will  need  to  make  reference 
to  the  joint  marginal  distribution  of  the  manifest  variables.  Due  to  the  assumption 
of  conditional  independence,  this  joint  distribution  is  given  by 

f(yv  y2>  -i  yp)  = [ YlfAyr  I u or)f{x  \ ejdx, 

Jxr= 1 


where  fx  denotes  integration  over  x and  fr{yr  | 6r ) is  a generic  representa- 

Ei  r>-r>  ■To'**  E Vi  r>  rlinEn  V\v»Ei  r>r>  fTinpEmn  r>  f E V>  r>  'vE  V<  momfor'E  rin  K1  n titV*  orn  V Ko 

a combination  of  variables  which  are  grouped  together  due  to  a violation  of  the 
conditional  independence  assumption.  Recall  that  6T  are  the  conditional  distri- 
bution parameters  for  the  rth  manifest  variable.  This  joint  marginal  distribution, 
the  variables  which  make  it  up,  and  the  conditional  distributions  of  the  Y_T  are 
discussed  in  detail  in  Chapter  4.  Further,  we  may  write  the  joint  marginal  distri- 
bution as  f(yv  y2,  ...,  yj  = /y(y;  w),  where  •••,  = (y)  and  u is  the 

vector  of  parameters  of  the  joint  distribution  that  lie  in  a parameter  space  D.  It  is 
seen  that  these  joint  distributional  parameters  must  be  in  terms  of  the  parameters 


60 


9X  and  9r,  for  all  r.  Further,  due  to  the  mappings  defined  by  Equation  3.5,  i.e. , 

9 — T_1  (h(x;  /?)) , this  is  equivalent  to  saying  that  the  elements  of  cj  must  also  be 
in  terms  of  the  model  parameters,  i.e.,  the  elements  of  0 = (/?' , 0'2,  ...,  0' , &x)' . 
Since  0 is  the  vector  of  parameters  that  must  ultimately  be  estimated,  these  are 
the  parameters  we  must  consider  in  discussing  identification.  The  parameter  space 
of  0 is  denoted  B.  By  the  preceding  discussion,  there  exists  a mapping  from  B 
onto  Vi  (the  image  of  B)  such  that  every  parameter  vector  0_  in  B gets  mapped 
onto  an  u in  0,  which  defines  the  parameter  space  of  f(yx,  y2 , y ).  Denote  this 
mapping  by  u(0)  : B — > VI. 

Applying  Fuller’s  [24]  definition  of  identifiability,  stated  earlier  in  Chapter  2, 
to  the  GSME  model  would  translate  to  the  vector  0 being  identified  if,  for  any 
0 E B and  0*  E B,  0 0*,  there  exists  an  arbitrary  vector  a,  with  elements 

(a'i,  ^2,  ...,  a'p)',  such  that  /y(a;  u(0))  ^ /y(a;  w(£*))- 

A necessary  condition  for  0 to  be  identified  is  dim(ui)  > dim(0),  where  dim(.) 
is  the  dimension  of  the  given  vector.  Under  the  present  GSME  model,  it  will  often 
be  impossible  to  determine  dim(u)  since  /(y^  y2,  ...,  y ),  the  joint  distribution  of 
the  manifest  variables,  is  usually  unknown  and  therefore  u is  unknown. 

The  following  two  theorems  provide  necessary  and  sufficient  conditions  under 
which  the  parameters  of  the  GSME  model  are  identified.  First  a lemma  that  is 
used  in  the  proofs  of  these  theorems. 

LEMMA  1:  If  the  parameters,  u,  of  the  joint  distribution  /y(y;  lu)  are 
identified,  then  the  mapping,  ut(0),  which  maps  the  parameter  space  B onto  the 
parameter  space  D,  must  be  a function,  i.e.,  each  0 E B is  mapped  into  one  and 
only  one  vector  of  Vi,  where  0 E B and  u(0)  E Vi. 

Proof:  Assume  not.  Then  there  exists  a 0 that  is  mapped  into  two  different 
uV s,  say  and  u2.  Since  uq  = yi{0)  and  w2  = cu(/3),  i.e.,  both  are  images  of 


61 


the  same  0,  the  densities  of  those  vectors  must  be  equal.  In  other  words,  we  have 
/y(a;  aq)  = fy(a ; u2),  for  all  a.  This  contradicts  the  identifiability  of  u.  0 

(Note  that  we  shall  use,  throughout,  the  symbol  0 to  indicate  the  completion 
of  the  proof  to  a Theorem,  Lemma,  etc.) 

THEOREM  1:  Assume  as  a regularity  condition  that  u,  the  parameters  of 
the  joint  distribution  of  the  manifest  variables,  /y(y;  ufi),  are  identified.  Then,  the 
vector  of  model  parameters,  0,  is  identified  if  and  only  if  the  mapping  of  B onto  f 2, 
denoted  by  u(0),  invertible. 

Proof:  (sufficiency)  Assume  that  u(0)  is  invertible.  Therefore,  uj.{0)  must  be 
one-to-one.  Then  for  0 ± 0*,  we  know  w(/ 3)  ± u{0*)-  Also,  by  the  identifiability  of 
u,  we  then  have  that  fy(a;  u(0))  ± fy(a ; u{0*))  for  at  least  one  a if  u(0)  / w(£*), 
and  therefore  /y(a;  /?)  ^ /y(a;  /T),  which  represent  the  densities  written  as 
functions  of  0 and  0*,  for  at  least  that  same  a.  Thus,  0 is  identified. 

(necessity)  Assume  0 is  identified.  Then  there  exists  an  a such  that  /(a;  u(0))  ± 
f(a ; oj(0*))  for  every  0 and  0*  such  that  /3  / /3*.  Now,  by  Lemma  1,  u(0)  is  a 
function.  Therefore  we  know  that  one  0 is  not  mapped  to  two  different  a/s,  thus 
we  are  only  left  with  showing  that  for  0 ^ /T  we  have  co(0)  yf  w(/3*).  We  do  this  by 
showing  a contradiction. 

Assume,  now,  that  u(0)  is  not  one-to-one.  Then  there  exists  a 0 ^ 0* 
such  that  u(0)  = u(0*),  i.e.,  two  different  /3’s  are  mapped  to  the  same  u. 

This  contradicts  the  identifiability  of  a ;,  as  we  know  there  exists  an  a such  that 
/(a;  u(0))  yf  /(a;  w(£*)),  for  all  0 ± 0*.  Therefore,  the  mapping  u{0)  is  one-to- 
one.  By  definition  given  at  the  beginning  of  this  section,  we  know  the  mapping 
u(p)  is  also  onto.  Thus  u(0)  is  invertible  and  the  proof  is  complete.  0 

Prior  to  stating  the  next  theorem,  we  give  a definition.  A just  identified  set  of 
parameters,  in  this  application,  exist  when  dim{ ui)  = dim(0)  and  u(0)  is  invertible. 
This  definition  is  used  in  the  next  theorem. 


62 


THEOREM  2:  Assume  that  u,  the  vector  of  parameters  of  /r(y;  u),  is 
identified.  If  dimiuj)  > dim(0 ),  then  the  vector  0 is  identified  if  there  exists  a just 
identified,  proper  subvector  of  u/(/9),  denoted  by  uj_i(0)  = (i.e.,  there  exists  a 

subvector  of  functions,  u1(0),  having  the  same  dimension  as  (3,  such  that  '1(.) 
exists  satisfying  = 0 for  all  0E  B). 

Proof:  Assume  there  is  a just  identified,  proper  subvector  of  u(0)  denoted 
by  oq (0)  = Wj.  So,  ux{0)  is  such  that  dim (ajj (/?))  = dim(0)  < dim(u)  and  it  is 
invertible,  by  definition.  This  implies  that  0 = cjj-1  (uq (§))■  Now  let  u = (wj,  u/2)' 
and  define  u*(u)  = where  1 denotes  the  inverse  of  the  invertible 

subvector  uh(0).  Then  w*(uj(0))  = uii1  (^((d))  which  is  equal  to  0.  Thus  u*(.)  is 
the  inverse  of  u(.).  By  Theorem  1,  0 is  identified.  <)  [Note:  In  the  existence  of  an 
invertible,  proper  subvector,  LO00),  of  w(/3),  we  have 


1 

r*H 

31 

i 

wi(£) 

1 

§1 

31 

i 

u2{@) 

w2 

That  is,  the  u2  parameters  are  redundant  with  aq.] 

To  see  these  theorems  applied  to  a simple,  easily  seen  case,  consider  the  simple 
linear  ME  model  with  normal  errors  defined  in  Chapter  2.  Using  the  notation  from 
the  current  chapter  the  model  would  be 

Ui  = 0io  + 0nx  + e 
Y2  = x + u, 

where  e ~ iV(0,  cr^)  and  u ~ AT(0,  <j„).  Further,  under  the  structural  model,  assum- 
ing normality,  we  have  x N{fix,  al).  Therefore  0 = (01O,  0n,  o\,  o\,  fix,  a^)1  and 
dim(0)  = 6.  (Note  that  the  notation  for  a variance  has  changed  from  that  defined 
in  Chapter  2.  We  shall  use,  for  example,  o\  in  place  of  axx,  because  attempting 
to  use  the  double  subscript  notation  for  a variance,  with  the  numerous  subscripts 


63 


required  for  the  GSME  model,  often  proves  to  be  too  cumbersome  and  allows  too 
easily  for  confusion.  Therefore  we  will  use  the  notation  with  the  squared  exponent 
to  represent  a variance  for  the  remainder  of  this  dissertation.) 

The  distribution  of  the  joint  density  is 


f 

o 

\ 

&Y,  °YiY2 

f(y i,  yi)  ~ MVN 

[^Yx,  HY2}'  , 

V 

2 

aYiY2  gY2 

) 

and  dim( u)  — 5.  Recall  that  a necessary  condition  for  (3  to  be  identified,  stated 
prior  to  Lemma  1,  was  if  dim{(j)  > dim{P).  So  the  first  indication  that  this  model 
may  not  be  identified  without  additional  information,  as  we  already  know  that  it 
is  not,  is  that  dim{P)  > dim{uj_).  We  know  because  of  the  dimensions,  we  may 
not  apply  Theorem  2.  Further,  we  know  that  u{P)  is  not  invertible,  such  to  give 
u~l{u{P))  = P , so  by  Theorem  1 we  may  not  conclude  P is  identified.  If  we  can 
assume  a known  parameter  in  (3_.  then  dim{(3)  — dim(ui)  and  we  then  have  hopes 
that  p is  identified.  As  we  have  seen  in  Subsection  2.2.2,  if  we  assume  o2u  is  known, 


then  the  solution,  u5  1(Q.(P))  = 

l 

, is 

Pi  i 

— 

(my2  - c Tly1mY2Y1 , 

{dl  dl) 

= 

(my2  - a2,  mY  - PumY2Y1) 

f ^ n \ fy  r T r n,  T " \ 

("  AU  / s,-  ' x , ii 


So,  in  this  case  where  a2  is  known,  we  may  apply  Theorem  1 stating  that  the 
vector  of  the  parameters  of  the  joint  marginal  distribution  is  invertible  and  thus  /? 
is  identified. 

Now,  with  the  addition  of  an  instrumental  variable  which  is  linearly  related  to 
x,  we  also  have  that  Y3  = p30  + /331x  + v,  where  we  may  assume  that  v ~ N( 0,  o2v). 
So  now 

P — {Pi  0,  Plli  ^30;  Psi  i A4!)  ax) 


64 


with  dim((3)  — 9 and 


( 

2 

\ 

°Yx 

°yxy2 

°yxy3 

f(y  1,  V2,  2/3)  ~ MVN 

[^Yi,  HY2,  VY3]  , 

°Yi  Y2 

< 

°y2y3 

{ 

ffr,y3 

°y2yz 

ak 

/ 

So  under  this  new  joint  marginal  distribution,  dim( uf)  = 9,  and  therefore,  by  the 
necessary  condition  stated  prior  to  Lemma  1,  we  know  we  have  hopes  that  the 
model  may  be  identified,  as  we  already  know  it  is,  since  dim({3)  < dim(uj_).  Again, 
because  of  the  dimensions  of  the  parameter  vectors,  we  cannot  apply  Theorem  2. 
We  do  know,  however,  that  we  can  arrive  at  u)  = /3,  which  is  given  by 


0ii 

(A4®)  fiio) 

(@31,  &3  o) 


rnY1  Y3 
my2y3 

my2yimy2y3 


m 


Vi 


fill  ™Y2 


mYiY3 

(y2,  y - pnY2) 
^y3-^y2 

my2Y1 

K - mY3  - 03lmY2Y3)- 


Yi 


Again,  we  see  here  that  co  has  an  inverse,  so  applying  Theorem  1,  we  have  that  /3  is 
identified. 

x,  but  conditionally  independent  of  the  other  manifest  variables.  Suppose  this  IV 
is  an  independent,  unbiased  measure  of  x,  estimated  from  some  other  method.  So 
Y4  is  estimated  from  an  independent  method  than  that  of  Y2.  and  is  different  and 
conditionally  independent  from  the  initial  IV,  Y3.  Therefore,  still  having  the  same 
models  and  assumptions  for  Y2,  Y3,  and  x,  we  now  also  have  Y4  = x + 1,  where 
t ~ N( 0,  erf).  Under  these  conditions,  the  vector  of  model  parameters,  which  we 


65 


assume  to  have  a unique  solution,  is  now 


P — (/?10j  P\\i  ali  *1,  PsOi  Psi  , ax)  > 


where  dim(j3 ) = 10.  Further,  the  joint  marginal  distribution  is  now 


2/2,  2/3,  Va)  ~ MVN 


/ 

2 

aYlY3 

\ 

^Yi 

°YiY2 

°YXY4 

[v-Yi,  My,  My,  My4f  > 

^yy 

CTy 

VYtYa 

°Y2y 

tfyy 

aY2Y3 

aY3 

OYsYl 

V 

_ 

aY3Y4 

o-y3y4 

to 

L 

/ 

with  dim(u)  = 14.  We  see  here  that  there  are  again  hopes  that  p_  is  identified  due 
to  the  necessary  condition  that  dim(P)  < dim(tj).  In  fact,  since  dim(ui)  is  strictly 
greater  than  dim(§),  we  must  consider  Theorem  2.  There  do  exist  invertible, 
proper  subvectors  of  u such  that  ^(wifi))  = §_■  One  such  just  identified  subvector 
is  given  by 


Pn 

(32,  %) 
(Ml!  Pio) 
(Pz  1,  Ao) 
Wu>  ) 


™Yiy3 

my2y3 

(my2y4,  my  ~ Pii°l) 


(Y2,Y1-P1M 


F3  - % & 


(my2  - my3  - AVS,  my4 


The  remaining  elements  of  cu(/3),  which  were  not  used  to  arrive  at  the  above 
estimators,  are  given  by  the  vector 

U2  = (/iy4  , CTy,y2,  CTy,  y„  • Cry3y4)'. 

So  we  see  that  there  exists  an  invertible  proper  subvector,  cj1,  with  dim^u^  = 
dim(P ),  and  thus  by  Theorem  2,  we  may  conclude  that  P_  is  identified. 


CHAPTER  4 

ESTIMATION  OF  THE  G.S.M.E.  MODEL 

Assuming  the  parameters  of  the  GSME  model  are  identified,  we  proceed 
to  develop  a method  of  estimation.  Initially  in  this  chapter,  we  define  the  three 
different  types  of  manifest  variables  potentially  included  in  the  GSME  model. 

After  discussing  the  different  types  of  variables  possible,  we  utilize  the  conditional 
independence  assumption  to  write  the  joint  distribution  function  of  the  manifest 
variables  in  terms  of  the  conditional  distributions  and  the  distribution  of  x.  Ulti- 
mately, we  show  how  the  joint  distribution  can  be  approximated  by  an  extension 
of  the  Latent  Class  model  and,  using  a method  of  generalized  least  squares,  how 
to  obtain  estimated  generalized  nonlinear  least  squares  estimates  of  the  GSME 
model  parameters.  We  do  this,  for  notational  convenience,  assuming  all  variables 
are  univariate  and  satisfy  the  conditional  independence  assumption.  Then  we  show 
an  important  example;  a generalization  to  handle  a multivariate  regression  for  the 
outcome  of  interest  which  consists  of  at  least  one  group  of  conditionally  dependent 
variables.  The  procedure  and  its  concepts  extend  to  the  general  multivariate  case. 

We  saw  in  Chapter  3 that  a key  component  in  defining  a GSME  model  is  the 
relation  between  the  conditional  distribution  parameters  of  the  manifest  variables 
(i.e.,  the  9r  parameters)  and  the  model  parameters  (i.e.,  the  0 parameters) 
describing  the  relation  between  the  parameters  modeled  (i.e.,  the  7^  parameters) 
for  each  manifest  variable  and  x.  These  relations  are  defined  by  the  mappings  in 
Equation  3.4.  Recall  that  the  relationships  are  defined  by  the  inverse  mappings 
6r  — Z71  (hr(x;  0r)^j-  We  shall  discuss  the  procedure  used  to  estimate  the  model 
parameters,  0_  — 0^,  ...,  /3^,  9^j  . The  broad  applicability  of  the  GSME 

model  is  due,  in  part,  to  the  fact  that  it  can  handle  situations  involving  any  type 


66 


67 


of  variable.  This  means  the  model  may  be  applied  to  all  types  of  categorical  and 
non-categorical  data.  We  distinguish  between  continuous,  categorical  variables 
following  a multinomial  distribution  (which  are  discrete  variables  taking  on  a finite 
number  of  values),  and  “other”  discrete  variables,  i.e.,  those  which  do  not  follow  a 
multinomial  distribution  but  take  on  discrete  values. 

In  Chapter  3,  each  Yr  had  dimension  pr.  One  such  vector  could  possibly  con- 
tain all  three  types  of  variables:  continuous,  discrete  quantitative,  and  multinomial. 
Let  r(c)r  denote  the  vector  of  continuous  variables  contained  in  the  rth  manifest 
vector  of  variable,  Yr,  Y_(dy  the  vector  of  discrete  quantitative  variables  contained 
in  Fr,  and  Y(m)r  the  vector  of  multinomial-categorical  variables  contained  in  Y_r. 
This  is  to  allow  Y_T  to  range  over  all  possibilities  from  being  a univariate  variable  of 
one  particular  type  to  a vector  of  all  three  types  of  variables. 

To  describe  the  joint  marginal  distribution  of  the  Yr,  r — 1,2,  ...,  p,  as  that 
given  at  the  beginning  of  Section  3.3  of  Chapter  3,  we  initially  define  fr(yr  \ x,  9r) 
as  a generic  conditional  distribution  for  Y_T.  So,  for  the  conditional  distribution  of 
Yr,  we  have 

(\  I (continuous) 

V(c)r  I 1W  *• 

/ \ I (discrete) 

(\  l(multinomial) 

X-(m)r  — y_(m)r  I X’  —(m)r ) ’ 

where  fr  (|/(c)r  | @-(c)r)  is  a multivariate  continuous  density  for 

^(c)r>  Pr  (^(d)r  I y^my  x ’ ^(d)r)  a multivariate  discrete  mass  function  for  Y(d)r, 

P (^(m)r  = }L(m)r  I x ’ ^(m)r)  are  multinomial  probabilities  defined  by  the  cross- 
classification of  the  elements  of  Y(m)r>  having  probabilities  defined  by  the  cell 
probabilities  of  the  cross-classification  of  its  elements,  and  I(.)  are  indicators 
defined  to  be  one,  if  Y_r  consists  of  the  type  of  variable  in  parentheses,  or  zero 
otherwise. 


68 


Under  the  assumption  of  conditional  independence  of  the  Y_T  given  x,  r = 

1,  2,  ...,p,  we  can  write  the  joint  marginal  distribution  of  the  manifest  variables  as: 


where  fx  denotes  integration  over  x.  The  conditional  distribution  of  the  outcome 
variable  of  interest,  Y_r.,  may  be  any  of  the  conditional  distributions  on  the 
right  hand  side  of  Equation  4.1,  therefore  we  know  that  r*  <E  {1,  2,  ...,  p}.  In 
attempting  to  use  Equation  4.1  for  parameter  estimation,  a problem  arises  because 


integral  on  the  right  hand  side  often  cannot  be  evaluated.  Therefore,  we  must 
develop  a methodology  based  on  Equation  4.1,  in  order  to  derive  estimates  for 
the  model  parameters.  The  notation  to  invoke  this  estimation  procedure  becomes 
very  cumbersome  when  the  model  is  written  in  its  full  generality,  as  in  Equation 
4.1.  Therefore,  for  ease  of  discussion,  we  show  in  detail  the  estimation  procedure 
assuming  all  variables  are  univariate  and  that  there  are  no  violations  of  the 
conditional  independence  assumption.  Thus,  there  is  no  need  for  the  grouping  of 
conditionally  dependent  variables. 

After  showing  the  entire  estimation  process  for  the  all  univariate  case,  we  show 
a generalizing  example  of  the  GSME  model  where  the  outcome  of  interest  requires 
multivariate  regression  for  a group  of  continuous  variables. 

The  estimation  procedure,  in  summary,  amounts  to  categorizing  all  variables 
which  are  not  initially  categorical  and  developing  the  joint  distribution  for  the 
resulting  categorical  variables.  In  doing  so,  it  is  seen  that  the  analog  to  Equation 
4.1  becomes  a form  of  the  Latent  Class  model  and  the  joint  distribution  after 
categorization  follows  a multinomial  distribution. 


(4.1) 


the  joint  distribution  on  the  left  hand  side  is  almost  always  unknown  and  the 


69 


4.1  The  Univariate  Case 

Assume,  for  notational  convenience,  that  all  manifest  variables  are  univariate 
and  satisfy  the  model  assumption  of  conditional  independence  given  x.  Without 
loss  of  generality,  arrange  the  p manifest  variables  such  that  Yr,  1 < r < p',  are  the 
continuous  variables,  Yr,  p'  4-  1 < r < p ",  are  the  “other”  discrete  variables,  and  Yr, 
p"  + 1 < r < p,  are  the  categorical  variables  that  follow  multinomial  distributions. 

It  is  useful,  when  describing  the  estimation  procedure,  to  define  more  dis- 
tinctive notation  for  the  conditional  distributions  of  the  three  types  of  manifest 
variables,  instead  of  letting  all  of  them  be  represented  generically  by  fr(yr  | x,  9r), 
r = 1,2,  ...,  p.  Suppose  that  the  form  of  the  distribution  function  for  x is  known 
and  represented  by  /(x  | 9^).  Suppose  further  that  the  continuous  manifest 
variables  have  conditional  distribution  functions  with  known  form  denoted  by 
fr{yr  | x,  9r),  for  1 < r < p ',  where  9r  £ ©r  are  fcr  x 1 vectors  of  parameters  for 
each  conditional  distribution.  Suppose  that  the  discrete  quantitative  variables  have 
known  probability  mass  functions  denoted  by  pr(yr  \ x,  9r),  for  p'  + 1 < r < p" , 
where,  again,  9r  e 0r  are  kr  x 1 vectors  of  conditional  distribution  parameters.  The 
remaining  variables,  Yr,  p"  + 1 < r < p,  are  “categorical”  each  following  a multi- 
nomial distribution.  For  these  variables,  we  let  YT  have  Cr  categories,  with  known 
conditional  cell  probabilities  given  by  P(Yr  — jT  \ x,  9r),  jr  = 1,  2,  ...,  Cr,  The 
dimension  of  the  column  vector,  9r , of  conditional  cell  probabilities  is  kT  = CT  — 1 
and  it  lies  in  a parameter  space  ©r. 

4.1.1  Translation  to  Latent  Class  Model 

The  first  step  towards  estimation  of  the  parameters  in  the  GSME  model  is 
to  categorize  all  continuous  and  discrete  variables  and  then  derive  an  equation 
analogous  to  Equation  4.1,  the  joint  marginal  distribution,  for  categorical  variables. 
Here  we  describe  the  categorization  method  and  discuss  important  notation.  First, 
categorize  each  continuous  Yr  into  Cr  mutually  exclusive  and  exhaustive  categories. 


70 


For  each  Yr  define  intervals  ( lrjr , urjr),  r — 1,  2,  p'  and  jT  — 1,  2,  CT. 
Denote  the  interval  defining  the  >t.h  category  as  Irjr  and  the  length (/rjr)  by  Lrjr. 

To  indicate  the  probability  that  a continuous  manifest  variable  falls  into  a certain 
interval,  we  write  P(Yr  £ IrjT)-  Note  the  intervals  or  “categories”  need  not  be  of 
equal  length. 

The  “other”  discrete  variables,  i.e.,  the  non-continuous  variables  not  following 
a multinomial  distribution,  may  also  need  to  be  categorized.  The  process  by  which 
the  Yr,  p'  + 1 < r < p”,  are  categorized,  is  essentially  the  same  as  that  for  the 
continuous  manifest  variables.  For  these  discrete  variables  requiring  categorization 
Irjr  may  be  thought  of  as  a collection  of  contiguous  values  of  the  discrete  variable. 

It  is  possible  in  an  applied  context,  that  a discrete  variable  of  this  type  may  take 
on  such  a small  number  of  values,  that  no  further  categorization  is  needed.  In  these 
cases  each  Irjr  would  actually  represent  a single  value,  as  opposed  to  the  collection 
of  discrete  values.  We  shall  denote  the  probability  that  a discrete  manifest  variable 
falls  into  the  set  of  values  IrjT  as  P(Yr  £ /r> ) . 

The  discrete  manifest  random  variables,  that  follow  a multinomial  distribution 
are  categorical  variables  that  require  no  further  categorization.  Therefore  Yr, 
p"  + 1 < r < p,  remain  unchanged.  (If  collapsing  of  categories  is  required 
it  should  be  done  apriori  and,  hence,  is  already  reflected  in  definitions  of  Yr, 
r = p"  + 1,  ...,  p.)  Recall  that  the  total  number  of  categories  of  Yr  is  denoted  by  CT, 
p"  + 1 < r < p,  and  that  these  variables  have  conditional  cell  probabilities  given  by 
P{Yt  — jr  | x , 6r),  where  jr  = 1,  2,  ...,  Cr.  Since  nothing  changes  for  the  categorical 
manifest  variables  after  the  categorization  step,  the  notation  for  the  conditional  cell 
probabilities  remains  the  same  as  used  previously. 

Let  x be  categorized  in  a similar  manner  as  mentioned  above  for  the  contin- 
uous manifest  variables,  where  its  categories  are  indexed  by  the  subscript  jx.  The 
categories  are  denoted  by  Ijx,  where  length^)  = Ljx,  jx  = 1,  2,  ...,  Cx,  and  the 


71 


probability  that  x falls  into  one  of  its  intervals  is  denoted  P(x  6.  Ijx).  Let  Xjx 
represent  the  midpoint  of  interval  jx. 

An  important  note,  now  that  the  categorization  has  been  discussed,  is  that  the 
difference  between  the  “discrete”  and  “categorical”  manifest  variables  as  discussed 
at  the  beginning  of  this  chapter  is  actually  intended  to  mean  the  difference  between 
non-continuous  variables  requiring  further  categorization  and  non-continuous 
variables  where  no  modification  is  needed.  By  this  we  mean  that  there  may  be 
scenarios,  depending  on  sample  size  perhaps,  in  which  a “discrete”  manifest 
variable  may  actually  require  no  categorization  or  a “categorical”  manifest  variable 
may  require  further  categorization.  As  previously  alluded  to,  if  a discrete  variable 
needed  no  further  categorization,  then  the  IrjT  for  that  variable  would  represent 
a single  value  of  that  variable.  This  may  be  needed  in  instances  where  a discrete 
variable  takes  on  a small  number  of  values.  Further  categorizing  a categorical 
variable  would  entail  combining  possible  realizations  of  that  variable  and  defining 
new  categories,  and  thus  new  conditional  cell  probabilities,  based  on  groups  of 
the  original  categories,  as  mentioned  above.  This  may  be  required  if  there  are 
an  unusually  large  number  of  categories  for  a categorical  variable.  We  use  the 
“discrete"  and  “categorical”  terminology  because  in  practice,  discrete  variables 
will  generally  be  those  that  require  categorization,  and  categorical  variables  will 
generally  be  those  that  will  not  require  further  modification. 


72 


The  joint  distribution  on  the  left  hand  side  of  Equation  4.1  after  categorization 
is  a multinomial  distribution.  For  the  categorized  variables,  Equation  4.1  becomes 

Th 32— jP  — P{Y\  € I\j1 , ...,  Ypn  £ Ip"jpi m ^p"+i  = V'+i,  ~ Jp) 

V 

jx  r= 1 

V 

x J]  P(Yr  =jr\xe  ljx , £.)P(x  G Ija  | 0J,  (4.2) 

r=p"+ 1 

where  njlj2...jp  indicates  the  probability  of  falling  into  multinomial  cell  j\j2-~jp,  for 
jr  — 1,  2,  CT,  r = 1,  2,  ...,  p.  The  last  equality  is  to  note  that  we  write  these 
cell  probabilities  as  functions  of  the  model  parameters,  f3 , again,  by  substitution  of 
the  distributional  parameters  for  the  mappings  defined  in  Equation  3.4  which  are 
in  terms  of  x and  /3r,  i.e.,  9T  = T~ 1 (jir(x ; §_r)^j,  for  all  r,  when  calculating  the  cell 
probabilities.  This  equation  is  a variation  of  the  classical  Latent  Class  model  which 
was  discussed  in  detail  by  Lazarsfeld  and  Henry  [38].  Latent  Class  models  are  those 
which  may  be  applied  to  cross  tabulated  data.  The  idea  is  that  a possibly  complex 
relationship  between  observed  variables  can  be  explained  by  a simple  relationship 
that  holds  for  unobservable  latent  variables.  The  data  for  Latent  Class  models  are 
characteristics,  discrete-valued  variables,  that  are  assumed  to  be  indicators  of  some 
underlying,  unobservable  concept  or  latent  variable  [63,  p.  194],  In  other  words,  a 
Latent  Class  model  can  be  thought  of  as  an  expression  which  includes  at  least  one 
covariate  that  may  be  considered  to  be  measured  with  error.  In  the  simplest  form 
of  the  Latent  Class  model,  the  observable  variables  are  dichotomous. 

Note  that  in  Equation  4.2  the  integral  over  x has  become  a summation  over  jx 
and  so  Equation  4.2  can  be  seen  as  a Latent  Class  model.  The  model  parameters 
enter  nonlinearly  in  the  right  side  of  Equation  4.2  through  the  conditional  proba- 
bilities which  are  calculated  for  x by  P(x  e Ijx  \ 0J  = ff  f(x  \ 6^)dx  = f(xjx  \ 


73 


dx)Ljx,  for  Ijx  of  small  length,  where,  recall  that  length^ ) = Ljx.  (Note  that  if 
any  particular  type  of  the  three  kinds  of  manifest  variables  does  not  exist  in  an 
applied  context,  assume  a value  of  unity  throughout  the  following  methodology  for 
that  variables’  density  and  ignore  the  associated  integration  or  summation.)  For 
the  cell  probabilities  of  a continuous  Yr,  we  have 


for  Ijx  of  small  length,  dyr  corresponding  to  the  integration  over  Irjr,  and,  where 
6rx  are  the  (unknown)  parameters  of  the  joint  distribution  of  Yr  and  x.  If  Yr  is 
discrete  the  probabilities  are 


where  the  summation  is  over  discrete  values  defining  the  set  of  values,  ITjr. 

In  practice,  it  is  during  these  previous  calculations  that  the  model  parameters, 
i.e.,  the  elements  of  /3  = fi2,  ...,  (fx)' , enter  the  right  hand  side  of  Equation 

4.2  by  substituting  the  conditional  distributional  parameters  of  the  conditional 
densities  with  the  functions  of  these  model  parameters  whose  mappings  were 
defined  in  Equation  3.4.  We  write  fT(yT  | x = Xjx,  /3J,  but  it  should  be  understood 
that  this  notation  represents  fr{yr  \ x = Xjxi  T^lhT{xjx\  §_T))->  which  of  course 
equals  fT{yT  \ x = Xjx,  6r).  The  notation  for  the  non-continuous  variables, 
pr{yr  | x = Xjx,  /?r),  should  be  interpreted  similarly. 


P(yf  el*  | *6 = 


P(Yr  € Irj„  xelj,  | pr) 

P(x  € Ij,  I S,) 

f,  Si  !r{y„  X I 9„)dxiyr 

° lTJr  *JX 

h f(x  I (L)dx 

xJx 

f/rJr  frillri  Xjx  | drx)djjxdyr 


P(Yr  e IrjT  I X3x  € Ijx,  Pr)  = Pr{yr\x  = Xjx,  PT), 


74 


In  its  full  generality,  for  univariate  manifest  variables,  Equation  4.2  becomes 

v' 

Kjih-jp = X/  n 

jx  r=l 


Pr(yr\x  = Xjx,  Pr) 

Vr^Irjr 
V 

n P(Yr  =jr\x  = Xjx,  0r)f{xjx  | e^)Ljx  (4.3) 

r=p"+ 1 

where  'x jxj2...jp{§)  indicates  the  cell  probability  written  in  terms  of  the  model 
parameters.  Essentially,  then,  if  the  rth  variable  is  continuous,  then  J)  fr(yr  \ x = 
Xjxi  P_r)dyr  is  included  in  the  product  and  if  it  is  discrete  requiring  categorization, 
then  EyT€irjr  pr(yr  | x = Xjx,  (3_r)  is  included.  Again,  note  that  if  there  are  no 
variables  of  a given  type  (continuous,  discrete,  or  categorical),  then  the  factor 
associated  with  that  type  of  variable  is  dropped  from  Equation  4.3. 

4.1.2  Estimation  of  Model  Parameters 

In  order  to  estimate  the  parameters  of  the  extended  Latent  Class  model 
defined  by  Equation  4.3,  we  note  that  the  sample  proportions  of  the  data  falling 
into  the  categories  indicated  on  the  left  side  of  Equation  4.3,  i.e.,  the  (N  — 1)  x 1 
vector  7T,  are  the  maximum  likelihood  estimators  of  P(Y1  e I\jx , ...,  Ep»  G 
V;p»!  Yp"+\  = jP"+i,  YP  = jp),  with  corresponding  ( N - l)-dimensional  vector  of 
cell  probabilities  n,  N = UL  1Cr.  In  the  estimation  procedure,  the  last  element  of 
7r  and  7?  must  be  dropped  to  avoid  a singularity  in  the  variance-covariance  matrix, 
hence  the  dimensions  of  those  column  vectors  are  N — 1.  Due  to  the  invariance 
property  of  maximum  likelihood  estimators,  substituting  7?  on  the  left  hand  side 
of  Equation  4.3  and  solving  for  the  model  parameters,  /3,  on  the  right  hand  side 
as  functions  of  the  elements  of  7?  gives  solutions  that  are  maximum  likelihood 
estimates.  This  is  possible  when  N - 1 = dim(P).  If  N - 1 > dim(/3),  then  there  is 


r=p'+l 


L 


fridJr 


X — X 


Jx 


, PJ  dyr 


75 


no  exact  solution  to  the  equations  ftjxj2...jp  equal  to  the  right  hand  side  of  Equation 
4.3,  and  iterative  generalized  nonlinear  least  squares,  similar  to  that  proposed  by 
Shah  [52]  for  diagnostic  tests  in  the  usual  Latent  Class  models,  can  be  used  to 
obtain  least  squares  solutions. 

Here,  we  will  give  a brief  motivation  for  the  use  of  the  method  of  estimation 
developed  by  Shah  [52],  which  was  based  on  Estimated  Generalized  Nonlinear  Least 
Squares  (henceforth,  EGNLS)  in  the  special  case  where  Yr  were  binary  variables  for 
diagnostic  tests  and  no  covariates  were  involved.  We  generalize  this  method  for  use 
in  solving  GSME  models.  EGNLS  amounts  to  maximization  of  an  approximating 
multivariate  normal  likelihood.  The  resulting  estimates  are  shown,  in  subsequent 
sections,  to  be  weakly  consistent  and  asymptotically  normal.  The  estimation 
procedure  is  developed  from  the  initial  relationship  presented  in  Equation  4.3  and 
concisely  summarized  in  vector  form  as 

£ = 1 1{§)  + £,  (4.4) 

where  /3  = ($',  ...,  ft , 9^.)'  is  the  (/i  + /2  + ---  + ^ + A:I)-dimensional  column  vector 

of  model  parameters  and  5r  is  the  vector  of  sample  proportions  which  are  unbiased 
and  consistent  maximum  likelihood  estimates  of  the  vector  of  the  multinomial  cell 
probabilities,  P(YX  G Ixjl,  ...,  € IP"jp„,  Yp,+1  = jp»+ 1,  ...,  Yp  = jp)  = 

for  all  possible  combinations  of  the  categorized  manifest  variables.  Further,  the 
multinomial  cell  probabilities,  7t(/3),  may  be  written  as  nonlinear  functions  of  the 
model  parameters,  as  defined  by  the  right  hand  side  of  Equation  4.3.  Recall  that 


76 


from  Equation  4.3,  we  have 

p' 

*hh-jP(P)  = 

jx  r=l 


PriVr  \x  = Xjx, 

Vr£lrjr 

V 

II  P{Yr  =jr\x=  Xjx,  §_r)f(xjx  | 6^)Ljx, 

T—p"  + 1 

where  it j1j2...jp{0)  is  the  cell  probability  associated  with  the  category  in  the 
(j i,  j2)  •••,  jP) th  cell  of  the  cross-classification  of  the  categorized  manifest  vari- 
ables, where  jr  = 1,  2,  ...,  Cr,  for  all  r.  In  Equation  4.4,  £ is  the  vector  of  estimates 
of  the  7 Tjlja...jp  and  n(P)  is  the  vector  of  corresponding  cell  probabilities  which  are 
written  as  nonlinear  functions  of  the  model  parameters.  We  further  have  that 
E(e)  = 0 and  Var(e)  = n_1E,  where  V = Cov^l^  and  f/j  denotes  an  observation 
vector  on  the  zth  subject  with  the  Zth  element  defined  by  Uu  = 1 if  the  subject 
falls  into  cell  l of  7r  from  the  joint  multinomial  distribution,  and  Uu  = 0 otherwise, 
l — 1,  2,  ...,  N - 1,  N = Hr=i  Cr-  Therefore,  V = diag(n(f3))  - 
Because  of  the  linear  dependency  ^ £ = 1,  V would  be  a singular  matrix  if  the 
last  element  of  7r  was  not  dropped.  In  the  estimation  procedure,  recall,  the  iVth 
component  of  £,  U_v  and  1 1 is  deleted  so  that  V is  an  (N  - 1)  x (N  — 1)  positive 
definite  matrix. 

Generalized  nonlinear  least  squares  estimates  of  j3  are  obtained  by  minimizing 
the  quadratic  form 

Q(0,  V)  = n (£  - 7r(/3))'  E-1  (£  - tt (§)) , 

with  respect  to  ft.  Estimation  is  hindered  by  the  fact  that  V is  unknown  and 
depends  on  the  parameters  /?.  EGNLS  amounts  to  minimizing 

Q(P,  V)  = n(n-  z r(P))'  E_1  (£  - tt(0))  , 


r=p'+l 


I, 


fr  (Ur 


X — X 


Jx 


* PMVr 


(4.5) 


77 

where  V is  any  calculated,  consistent,  positive  definite  estimator  of  V,  and  there- 
fore  is  not  a function  of  the  parameters.  This  requirement  for  V-1  should  hold  as 
long  as  there  are  no  zero  counts  in  any  cell. 

The  solutions  which  minimize  this  quadratic  are  obtained  by  setting  its  deriva- 
tive, with  respect  to  /?',  equal  to  a vector  of  zeros  and  solving  for  the  elements  of 
/3.  For  the  GSME  model  this  is  no  trivial  task,  due  to  the  difficulty  in  deriving  the 
first  derivative  of  Equation  4.5.  A further  difficulty  arises  from  the  need  to  use  the 
second  derivative  of  Q(l 5,  V)  in  developing  the  asymptotic  properties  of  the  EGNLS 
estimators  of  the  GSME  model.  The  difficulties  arise,  in  part,  due  to  the  numerous 
and  complex  derivatives  of  matrices  and  vectors,  but  also  in  differentiating  the 
compound  mappings  involved  in  7 r(/3).  Before  we  derive  the  first  derivative  of 
Equation  4.5,  we  state  and  prove  a general  chain  rule  for  derivatives  of  multiply 
compounded,  multivariable  functions,  which  has  not  previously  been  written,  to 
this  author’s  knowledge. 

Recall  that  7 r(/?)  involves  the  vectors  T_1  and  h which  are  defined  to  be 
column  vectors  whose  elements  are  the  individual  mappings  used  to  relate  the 
parameters  of  the  conditional  distribution  of  Y_r  given  x to  x.  When  written  in 
terms  of  /3,  we  abbreviate  the  vector  of  multinomial  probabilities  as  7r(/3)  where 
it  should  be  understood  that  this  is  a symbolic  representation  of  the  entire  vector 
7L  (Z_1(h(x;  /5 [))).  Schott  [49,  p.  327]  established  a rule  for  the  derivative  of  a 
compound  function  that  is  vector  valued.  Using  his  notation,  he  states  that  if  y 
and  g are  real  valued  functions,  / is  a vector  of  functions,  and  x is  a vector  such 
that  y(x)  = £?(/(x)),  then 


(4.6) 


78 


for  i — 1,  2,  n.  These  derivatives  with  respect  to  x,  can  be  placed  into  one 
vector,  ^r-,  which  can  be  written  as 


We  generalize  this  to  multiply  compounded  functions  of  F vector  valued  functions 
in  the  following  result. 

Result:  Let  y(x)  = <7(/F--(/2(/1  (£))))  be  a general  multiply  compounded, 

multivariable  function.  Then  the  derivative  of  y(x)  with  respect  to  xf  can  be 
written  as 


Proof:  By  Schott’s  result,  the  result  holds  for  F = 1.  That  is,  we  may  write 


d_ 

dx 


7Zi(2) 


_d_ 

dx 


;v<£)  = £ (ak^l)  (£'«<*>) 


and 


d_ 

dx' 


,(2)  • 


= (^(/,) j [gpL 

Further,  for  the  jth  element  of  / , fa,  we  have 

~9  (HL))  = (ydfa9^-2"1) 


dfx 


by  Equation  4.6.  Combining  these  gives, 


_d_ 

dx 


3 k 


hhAt) 


Qj’ 


79 


as  the  ith  element  of  the  row  vector  \J_2  ^(xjjj.  This  process  may  be 
repeated  F times  giving  the  result  that  if  y(x)  = g(fF  — {f^(f  (x)))),  then 

~~  ■"  ( W2^~ 2^)  (dx7-*^)  ’ 

which  is  arrived  at  by  repeatedly  replacing  the  summations  with  matrix  multiplica- 
tion, as  was  done  above.  ■0* 

To  apply  this  general  result  for  multiply  compounded,  multivariable  functions 
to  the  GSME  model  we  let  y(x)  = Vj{0)  and  ^(/2(/1(«)))  = /?))),  for 

j — 1,  2,  ...,  N,  N = Jlr=i  Cr-  Placing  the  resulting  row  vectors  of  the  derivative 
of  7Tj(T_1(/i(x;  /?)))  with  respect  to  one  atop  another,  gives  us  the  N — 1 x v 
matrix 

= JpT  ( T~l[h{x ; §)))  , 

where  v is  the  dimension  of  /?  and  N - 1 is  the  dimension  of  n(/3),  due  to  the  need 
to  drop  the  TVth  element  to  avoid  the  linear  dependency  which  is  created  because 
the  cell  probabilities  sum  to  one.  In  general,  using  the  result  proven  above,  the 
(j,  i)th  element  of  is 

(^wr1))  (|-r‘<w)  &) , 

where  the  last  kx  of  the  Pi  should  be  understood  to  simply  be  symbolic  representa- 
tions of  the  elements  of  6^.  This  derivative  will  be  used  in  what  follows,  especially 
later  in  the  chapter  in  developing  the  asymptotic  properties  of  the  estimators. 

We  will  simply  be  writing  to  represent  the  entire  N - 1 x v matrix  of  the 
derivatives  of  the  multiply  compounded,  multivariable  vector  valued  function,  ZL(/3). 

Let  us  recall  that  we  are  trying  to  minimize  the  quadratic  form  Q(0,  V ) given 
in  Equation  4.5.  To  write  out  the  derivative  required  to  perform  this  minimization, 
we  will  need  to  apply  the  result  proven  above  along  with  another  result  from  Schott 
[49]  concerning  the  derivative  of  a quadratic  form  of  a vector  x and  a symmetric 


80 


matrix  A.  Schott  [49,  p.  329-330]  shows  that  jpYAx  = 2 z'A  Applied  to  our 
quadratic  form  and  letting  (5r  — 7r(/5))  = A7r,  we  have 

*0  = 2n  (£  - 7t(/5))'  y-1 


and  therefore 


Because  A^  = — /,  where  I is  the  identity  matrix,  we  have 


^ W,  V)  = —In  (n  - z{§))'  V 1 ■ 

Again,  it  is  known  that  setting  this  derivative  equal  to  a vector  of  zeros  and  solving 
for  0 minimizes  the  quadratic  form  in  Equation  4.5,  i.e.,  solving  -^tQ(0,  V ) = 0' 
for  0,  provided  that  only  a global  minimum  exists.  This  solution,  however, 
is  the  same  as  the  solution  when  using  the  estimating  equations  S_n(0)  = 

(§ / (~2n)  = 0.  Therefore,  we  solve  the  following  system  of  equa- 
tions: 

S,®  = (2  - m)  = a-  (4.7) 

In  the  special  case  where  the  Yr,  r — 1,  2,  ...,  p,  are  indicators  (0,1)  of  the 
results  (no  disease,  disease)  of  p diagnostic  tests,  Shah  [52]  suggests  using  sample 
proportions,  5r,  to  estimate  V consistently.  The  Continuous  Mapping  Theorem  [51, 
p.  24]  states  that  a continuous  function  of  a consistent  estimate  is  consistent  for 
the  same  function  of  the  consistent  estimate’s  probability  limit,  if  the  function  is 
continuous  with  a probability  of  one.  We  know  that  V = diag( z{0))  ~ e(0)e(0)' 
and  z are  sample  proportions  which  are  unbiased  and  consistent  estimates  of  z(0), 
therefore,  our  consistent  estimator  of  V shall  be  V = diag(n ) - EE,  provided 
all  the  elements  of  z are  strictly  between  zero  and  one.  This  is  to  guarantee 


81 


that  V is  positive  definite.  Another  possible  choice  as  an  estimate  of  V could  be 
diag{ k{P))-k(P)k{P)',  where  p_  is  any  consistent  estimate  of  /?  and,  again,  provided 
that  the  elements  of  n(j3)  are  greater  than  zero,  but  less  than  one,  to  ensure  that  V 
is  positive  definite. 

A widely  used  method  for  computing  nonlinear  least  squares  is  the  modified 
Gauss-Newton  method,  which  will  be  used  to  estimate  the  P parameters  on  the 
right  hand  side  of  Equation  4.4  (see,  for  example,  [31]).  The  reason  for  using  the 
modified  Gauss-Newton  procedure,  is  because  it  is  known  that  a draw  back  to 
using  the  standard  Gauss-Newton  procedure,  is  that  there  is  no  guarantee  that 
the  standard  algorithm  will  converge.  In  fact,  Wolfe  [64,  p.  230]  states  that  “A 
major  difficulty  with  the  Gauss-Newton  Method  is  the  initial  estimate  is  seldom 
sufficiently  close  to  ensure  convergence  to  the  true  value.”  Further,  the  modified 
Gauss-Newton  method,  “whilst  sharing  the  adventageous  features  of  the  Gauss- 
Newton  method,  has  the  additional  merits  of  a guaranteed  convergence”  [31,  p. 

269].  Hence,  we  utilize  the  modified  Gauss-Newton  procedure. 

The  first  step  in  deriving  the  EGNLS  estimates  is  to  substitute  a first  order 
Taylor  series  approximation  about  consistent,  initial/trial  values,  ft  , to  the 
nonlinear  function  n(P)  in  the  quadratic  form  Q(P,  V).  The  approximation  is 
1 1(§)  ~ k(Pt)  + (P  ~ P_T),  where  throughout  ^J-,r)  should  be  interpreted 

as  the  derivative  of  with  respect  to  /?'  evaluated  at  the  initial  estimate  P , i.e., 


p „ ■ Using  this  approximation,  it  can  be  shown  (see,  for  example, 


ap  ~ dp 

[20,  p.  221])  that  the  approximating  quadratic  form  is  minimized 

' dn(PT) 


P-M-P.T  + 


dp' 


V 


-l 


dl l(PT) 

dp' 


dp' 


v [ K-k(Pt)\ ■ (4.8) 


There  is  no  guarantee  that  fiM  is  better  than  PT  for  approximating  the  least 
squares  estimate,  /?,  in  the  sense  that  Q{P_M,  V)  < Q(PT , V).  However,  one  can 


82 


show  that,  there  exists  a A*  such  that  all  points  %,  defined  as  /3  = PT  + X0M  - ft  ), 
0 < A < A*,  are  such  that  Q(/3_ , V)  < Q(0.T,  V).  The  remaining  two  steps 
describe  the  modified  Gauss-Newton  method.  Letting  denote  the  estimate  after 
k iterations,  compute,  at  iteration  k + 1, 


Dk  = 


dp  ) \ dp  ) 


(4.9) 


Next,  find  a Afc  between  0 and  1 such  that  Q(/3  + V)  < Q(j3  , V)  and 

— fc  — k 

set  the  new  estimate  as  §_k+1  = + A kDk.  One  can  use  any  reasonable  convergence 

criteria  to  determine  if  the  algorithm  has  converged,  e.g.,  based  on  the  distance 
between  estimates  at  iteration  k and  A;  + 1. 

Here,  we  develop  an  expression  for  in  Equation  4.9  where  we  do  so 

through  repeated  use  of  the  chain  rule.  The  result  of  the  derivative  is  a (N  - 1)  x 


(h+l2+—+lp+kx)  matrix  of  derivatives,  i.e.,  = 


where 


SsHiJ  _ 
80'  ~~ 


dz(Pk)  dz.(0k) 

w. 


dz(fjk)  d^{jk) 

dfirl  ’ d/3r2  ' 


•’  dfirlr 


, for  all  r,  and 


The  dimension  of  each  is  (N  - 1)  x lT  and  that  of 


—1 

98!x 

w. 


W2  > 


dull k)  dn(0k) 

ap1 


~d¥~ 


dx&iJ  $»(&) 

d6i  ’ 89-2  ■ 

is  (N  — 1)  x kx. 


d6kx 


We  shall  show  these  derivatives  in  terms  of  a generic,  individual  (3  for  the 
r'th  variable,  i.e.,  /3r'i,  say.  This  should  be  interpreted  as  the  ith  parameter, 

* = L 2,  ....  /r/,  for  the  r'th  manifest  variable.  When  the  derivative  is  performed  for 
the  parameters  of  x,  we  show  the  derivative  with  respect  to  the  ith  parameter  of 
he.,  where  in  this  case  i = 1,  2,  ...,  kx.  Recall  for  use  in  what  follows  that  the 
abbreviated  notation  for  T;1  (h,r(x-,  is  T;1  and  for  hr(x ; §_r)  the  abbreviation 
is  hr,  for  each  r. 


83 


Given  the  model  in  Equation  4.3,  the  derivative  of  ir with  respect  to 
the  ith  parameter  of  interest  for  the  r'th  manifest  variable  is 


d*jlj2...jp(§)  = 9Ej,ABCD 

dfir't 


dPr'i 

S*. 

f/A  BCD 

L dPr'i 

Sy, 

\af  ACD 

[OPr'i 

Si, 

[efruABD 

Si, 

ffiABC 

if  r'  < p' 

if  p'  + 1 < r'  < p" 
if  p"  + 1 <r'<p 

if  /?  , = L , 


(4.10) 


where 


A 

B 

C 

D 


FEU  IIr]r  fr(yr  \ X = Xjx,  fijdy, 

IIr=p'+l  ^2yrelrjr  MVr  I X ~ Xi*i  P_r) 
V 

II  P(Yr  = Jr  \ * = Xjm, 
r=p"+ 1 

f(xu  I L)Ljx 


and  for  the  derivatives,  we  have 
dA  8A 


d Si , fr'ii/r'  I x — Xjx , 8,)dyr 


dfir'i 


d h , fr'  (yr<  | X = Xjx , Pr, ) dyr. 
Jr> 

dT;1  dhr, 


dT; 


-i/ 


x 


dhfr,  dfir'i 

I friyr  | X Xjx , f3  )dyr 

Jlr,„ 


n 

r^r'  L"  lr> 


d Si fr'iVr'  | x Xjx , §_r,)dyr 

r 3 r 1 


dT; 


-1/ 


dT;1  dhr, 
dh!r,  d(dT'i 


and 


84 


= dD  df  fo.  I &) 

^ df(xjm  I 0J  Mi 
t df{xjx  | 0J 

M< 

The  calculations  for  and  are  similar  as  to  those  for  accept  with 
Y.yr,eiT,lrl  Pr'iVr'  I ® = xjx,  £.,)  and  P(Fr-  = jr,  \ x = xjx,  pj),  respectively, 
replacing  J7  ( /r'(yr'  I % = xjx,  0 )dyTi.  The  final  step  would  be  the  concatenation 

r Jri 

of  the  vectors  in  Equation  4.10  into  the  matrices  or  We  shall  discuss 

9 fir,.  t Sr’  (»r ' 1*=*;* , £r,  )dyr, 

— grr^' s which  is  part  of  in  two  cases  in  the  next  subsection 

where  we  discuss  the  integration  required  in  the  categorization  step  described 
earlier  in  this  chapter.  The  first  case  will  be  when  closed  forms  for  the  necessary 
integration  are  available  and  the  second  case  when  no  closed  forms  exist  and 
numerical  integration  must  be  utilized. 

In  applications,  PROC  NLIN  and  PROC  IML,  from  the  SAS  software  package, 
are  used  together  to  obtain  EGNLS  estimates  of  /3  by  the  modified  Gauss-Newton 
method.  Other,  more  “programming”  intensive  packages  could  be  used  to  obtain 
these  estimates  via  modified  Gauss-Newton,  such  as  OX  or  R. 

4.1.3  Integration  Examples  of  the  Categorization  Step 

In  this  subsection  we  discuss  the  required  integration  over  the  continuous  Yr 
which  is  needed  to  calculate  the  conditional  cell  probabilities  in  the  categorization 
step  leading  to  the  extended  Latent  Class  Model  in  Equation  4.3.  The  probability 
of  falling  into  region  ITjr  of  the  rth  manifest  variable,  if  that  variable  is  continuous, 


85 


is  found,  in  general,  by  the  following  integration: 

P{Yr  £ Irjr  | x £ Ijz , /3^)  = I frillr  | x Xjx . P_r)dyr 

" Irjr 

CUrjr 

= / fr(yr  | x = xjx,  §_r)dyr 

— FYr\x{Urjr  ) — -Fyr|x(Zrjr), 

where  ( lrjr , urjr)  are  the  endpoints  of  category  ITjr  and  Fyr|I(.)  is  the  cumulative 
distribution  function  for  Yr  given  x.  For  distributions  with  closed  form  CDF’s, 
such  integrals  can  be  precisely  calculated.  In  other  circumstances,  one  must  use  nu- 
merical integration.  Here  we  shall  discuss  both  cases;  one  where  all  integrals  have 
closed  form  solutions  and  one  where  numerical  integration  must  be  implemented. 
Whether  this  integral  can  be  directly  calculated  or  not,  affects  how  the  derivatives, 
in  the  EGNLS  step  are  calculated. 

Case  1 

We  have  that  P(Yr  £ Irjr  | x £ Ijx,  0r)  = FYr\x(uTjr)  - FYr\x(lrjr ),  in  general, 
assuming  closed  form  solutions  of  the  CDF’s  exist.  In  this  case,  the  derivatives 
required  of  the  right  hand  side  of  Equation  4.3  can  be  calculated  directly  since 
the  CDF’s  are  in  closed  form  and  in  terms  of  the  compound  mappings  of  model 
parameters,  i.e.,  T71  (kr{x ; /3r)),  so  one  may  proceed  directly  with  EGNLS. 

Suppose,  for  example,  that  we  have  univariate  Yr  which  are  distributed 
Logistic (Ar,  pr),  for  each  r,  where  Ar  is  the  mean  and  the  variance  is  pj-1:2/ 3.  It 
is  known  that  the  logistic  distribution  is  symmetric  about  its  mean  and  with  only 
slightly  heavier  tails  than  the  normal  distribution.  Therefore,  in  practice,  the 
assumption  of  a normally  distributed  variable  is  essentially  interchangeable  with 
the  logistic  assumption.  The  logistic  distribution  has  the  advantage  of  having  a 
closed  form  CDF. 


86 


For  this  example,  we  assume  there  are  no  discrete  or  categorical  manifest 
variables.  Then  it  can  be  shown  that  for  each  Yr, 


where  recall  (lrjr,  urjr ) is  the  notation  indicating  the  endpoints  of  interval  Irjr, 
jT  = 1,  2,  Cr,  in  the  categorized  version  of  Yr.  Assuming  one  was  interested 
in  modeling  the  mean  (and,  as  a consequence  of  the  GSME  model  specifications, 
the  variance)  for  each  Yr , the  mappings  of  Equation  3.4,  defined  in  the  beginning 
of  Chapter  3,  would  be  Ar  = 7rl  = hTi(x ; /?rl),  where  7ri  = E(Yr  \ x ),  and 
p2r 7r2/3  = 7r2  = Pri,  where  7r2  = Var(YT  | x)  and  /3r 2 is  a constant,  for  all  r.  Note 
that  hri  (.)  is  any  specified,  perhaps  nonlinear,  mean  function  in  terms  of  x and  . 
Using  the  closed  form  for  the  CDF  of  the  logistic  distribution,  we  have  for  each  r 


The  probabilities  for  the  categories  of  x are  approximated  as  P(x  G Ijx  | 9. r)  = 
f(xjx  | Ec)Ljx  and  no  integration  is  required,  so  the  distribution  of  x need  not  be 
assumed  to  follow  a logistic  distribution,  or  any  other  distribution  with  a closed 


Ijxi  PT)P{X  £ Ijx  I £r)-  Here,  closed  form  solutions  for  CDF’s  allow  the  probabilities 
on  the  right  hand  side  of  the  equation  to  be  in  terms  of  the  model  parameters, 


tion  4.2,  P(Yi  G /1;7l,  Y2  G I2j2,  Yp  G IPjp ),  are  estimated  by  multinomial 


P(Yr  G ITjr  | x G Ijxi  0_r)  — 

J^r  (i/pr)  [exp  ~(yr  - A r)/pr\  x [1  + exp-(yr  - A T)/pr}~2  dyr, 


form  CDF. 


The  extended  Latent  Class  Model  in  Equation  4.2,  for  the  current  exam- 


i.e.,  (3  = {j3'v  P'2,  ...,  P'p,  . So,  the  elements  for  the  left  hand  side  of  Equa- 


87 


sample  proportions,  i.e.,  5f,  calculated  from  the  data,  and  the  right  hand  side, 

Yjj  ]lr=i  P(Yt  € Irjr  | x e Ijx,  Pr)P(x  e Ijx  | 0J,  are  nonlinear  functions  of 
the  model  parameters,  i.e.,  /3.  We  are  now  ready  to  apply  EGNLS  to  estimate  the 
GSME  model  parameters,  /3. 

EGNLS  requires  consistent  initial  estimates  for  favorable  asymptotic  properties 
of  the  estimators.  We  shall  discuss  a method  for  developing  these  consistent 
estimates  at  the  end  of  this  chapter.  Application  of  EGNLS  involves  derivatives 
with  respect  to  the  model  parameters  and  then  using  these  derivatives  in  Equation 
4.9.  Since  in  the  current  example,  closed  form  solutions  of  the  necessary  integration 
required  to  calculate  probabilities  for  the  categories  of  the  variables  exist,  the 
required  piece  of  for  use  in  the  EGNLS  procedure  is  straight  forward  to 
calculate.  In  the  current  example,  the  Tr i mapping  is  the  identity  for  each  variable, 
so  for  the  r'th  variable,  we  have: 


dP(Yr'  E Ir'jr,  | x 6 Ijz,  fir,) 


dhr'i 


/r 


9 (hr'l{Xjx] 


7T  (urijr,  hr'l(Xjx]  §_ri^)  ) 
1+exp 


-1 


n \}r'jr'  ^r'l  (Xjxi  §_rn)) 

1+6XP  VW2 


-IN 


7r  (ur’jr,  hr'i(xjx ; 9rll)j 
1 + exp~ 


-2 


exp 


7T  (^Ur'jr,  h-r'l  (Xjx  ! /3_ri  ^ ^ 


d 


-Tr  (ur'jr,  - hT. i(xix;  ^r(1)) 


-2 


d (/lr'l(Xjx;  Z^)) 

— 7T  (lr'jr,  — hT'i(Xjz ; ( —It  ijr'jri  ~ hr'l{xjx]  \ 

1+exp  j [exp  VW2  J 


d 


d (hr’l{xjx-,  ^rll)) 


exp 


Tr  (l r'jri  hT'\{Xjx]  0rll)^ 


r'2 


88 


This  part  of  would  be  used  when  taking  derivatives  with  respect  to  the 
elements  of  ^ . EGNLS  requires  derivatives  with  respect  to  all  the  model  param- 
eters, so  we  also  must  consider  the  derivative  with  respect  to  the  constant  /3r/2- 
The  compound  mappings  associated  with  this  parameter  are  far  less  complex  than 
those  for  the  elements  /3  due  to  the  fact  a functional  form  must  be  specified  for 
hr'i(xjx;  $ ),  whereas  the  form  for  the  constant  variance  is  known.  The  necessary 

„.n  9 hr,  . fr<  (Vr<  \x=xjx  > Pr,  )*Vr' 

pieces  of  the  derivative  reduce  to  — — , where  Q is  the 

r °Pr'i  dfLr'2  — r'2 

scalar  /3r/2.  This  derivative  is 

dP{YT'  e Ir'jr,  \ xeljx,  §_r,)  _ 

PPr'2 


d(3r'. 


-7T  [ur'jr,  - hT'i(x;  /?  )) 

1 + exP 


- -1 


\L 


VWr'2 


-7T  (ur <j„,  - hrn(x;  £ )) 

1+exp vm 


-2 


exp 


^ yr'ir'  hr'l(x\  Pi-,)  I 

1+e*p  vrn  L 

-7T  ^ur-if.(  - hr'i(x;  Prn)^ 


r'2 


-37 r - hr'i(x;  £r>1)) 

2 (3/3r-2)3/2 


+ 


1 + exp 


-77  hr'l{x;  P_r,])^ 


VW: 


r’2 


-2 


x exp 


7T  ( Jr'jri  hr'i(x:  P_r,^)^  3tt  (^r'jr,  hr> i(x; 


V3Pr': 


2(3^2) 


3/2 


"X\ 

/ 


These  derivatives  are  used  in  computing  the  estimate  at  iteration  A;  + 1,  by  finding 

n —1  . / 


Dk  = 


dP 


V- 


-1  ( 

dp 


dz(Pj, 

dp 


V — 2L(^)]j  and  then  a 0 < A*  < 1 


such  that  Q{Pk  + A*a.  n < «(£*,  ^). 

In  any  application  where  all  distributions  can  be  integrated  in  a closed  form, 
such  as  the  one  just  discussed,  any  of  a number  of  computer  packages  may  be 
utilized  to  perform  the  necessary  calculations.  Examples  of  such  packages  are  SAS 
using  PROC  IML,  OX,  and  R. 

Case  2 


The  second  case  for  calculating  the  integrals  needed  in  the  translation  to 
the  Latent  Class  model  assumes  no  closed  form  solutions  exist  for  the  CDF’s  and 


89 


numerical  integration  must  therefore  be  implemented.  This  is  the  case  that  would 
most  likely  be  used  in  practice.  We  would  like  to  perform  the  following  integration: 


P{Yr  e Irjr  | x G Ijx,  §_r)  = j fr{yr  | x = xjx,  Pr)dyr 

J Ir  i- 


rUrJr 

= / fr(Vr  I X = xjx,  P )dyr 

Jlrir 

= pYr\x{urjr)  ~ pYr\x(lrjr) , 


however,  no  closed  forms  exist  for  FYt\x{-)-  Using  EGNLS  to  estimate  the  param- 
eters of  Equation  4.2,  will  require  derivatives  with  respect  to  0.  For  this  case, 
the  appropriate  derivative  for  the  r'th  variable  to  be  used  in  in  the  EGNLS 
procedure  is: 

fr-{yr'  | x = xjx,  0r,)dyv  f dfr'{yT,  | x = xjx,  0r,)  f 

ar?  “ lr,,,  ~dVr'' 

Jr' 

assuming  the  order  of  differentiation  and  integration  is  interchangeable.  If  we 
know,  for  example,  that  we  are  within  the  exponential  family,  then  this  inter- 
changeability is  guaranteed.  In  applications,  since  the  distribution  for  YT>  is  known, 
the  derivative  inside  the  integral  can  be  computed  and  then  initial  starting  values 
are  used  to  perform  numerical  integration. 

Let  us  assume  that  Yr>  is  a univariate,  continuous  variable  in  the  single 
parameter  exponential  family.  Then  for  the  right  hand  side  of  the  previous 
equation,  with  Tr7x  ^hr/(x;  = Tr7x  in  place  of  the  canonical  parameter, 


90 


dr',  we  would  have 


dfr'iVr'  | X = Xjx,  0 ) 


dyT 


L 

r v 

= [ [exp [{yr'T^1  - bTr  (p1)} /(*>(&)  + Cr'iVr',  <j)r-)] 

Ir'ir' 

x [{Vr-  ~ b'r,  (T^1)} / ar' ((fir')]  dyr>, 


dexp[{yr'Tr, 1 - bT'  {Tr,  l)}/aT,((fiT,)  + Cr-(yr-,  (fiT-)] 

dT~l 


dyr 


where  b'r,  ( T JJ1)  is  the  derivative  of  br>  (Z^T1)  with  respect  to  T^1.  This  integral, 
with  specified  starting  values,  can  now  be  calculated  using  numerical  integration 
and  thus  be  used  in  the  appropriate  place  in  calculating  So,  in  such  a case 
where  there  are  no  closed  forms  for  the  CDF’s,  the  derivatives  required  in  the 
EGNLS  procedure  are  pushed  through  the  integration  first,  and  then  after  the 
derivatives  are  taken  and  estimates  substituted  for  0,  numerical  integration  is 
utilized.  In  applications,  one  of  the  many  mathematical-type  software  packages 
that  can  perform  numerical  integration  can  be  used.  MATLAB,  MAPLE,  and 
MATHEMATICA  are  three  such  software  packages  that  could  be  used. 

4.2  Generalization  to  Estimating  a Multivariate  Model 
Now  that  we  have  seen  the  procedure  for  estimation  and  the  notation  that  is 
required  in  the  simple  case  of  all  univariate  manifest  variables,  we  shall  show  by 
example  how  the  grouping  of  a set  of  continuous  variables  that  are  conditionally 
dependent  with  each  other  is  handled.  Together,  these  continuous  variables  are 
conditionally  independent,  given  x , of  the  remaining  manifest  variables.  We  shall 
illustrate  such  grouping  into  conditionally  independent  subsets  of  variables  in  the 
context  of  the  multivariate  simple  linear  regression  model  with  measurement  error 
in  x.  Let  the  multivariate  response  vector  be  the  outcome  vector  of  main  interest, 
Fr.in  the  GSME  model.  This  is  an  important  application  because  it  allows  the 
GSME  model  to  be  applied  to  multivariate  regression  situations,  as  discussed  in 


91 

Section  3.2  of  Chapter  3,  where  we  saw  the  numerous  applications  of  the  GSME 
model.  Recall  from  that  section  in  Chapter  3,  that  a nonlinear  multivariate 
regression  model  for  a vector  valued  variable  outcome  of  main  interest  would  be: 


Yr.l 

^r.1) 

Yr- 2 

vMx-,  pr.2) 

1 

• 

1 

. ^r*Pr.(a:;  /3r.pJ  _ 

where  /xr.s(.)  is  an  arbitrary  nonlinear  mean  function  for  the  sth  variable,  s = 

1,  2,  pr.,  and  we  assume  E(er.)  = 0 and  Var(er.)  = E6r. . We  know  that 
£er.  is  not  a diagonal  matrix  due  to  the  conditional  dependence  of  the  variables 
which  make  up  the  multivariate  response  variable.  Assume  the  remaining  manifest 
variables  are  univariate  and  that  they  are  conditionally  independent. 

We  know  that  there  are  pr . continuous  variables  which  are  conditionally 
dependent  and  that  they  are  grouped  together  to  form  a multivariate  manifest 
vector  Yr..  The  joint  conditional  distribution  of  the  continuous  variables  is  known 
and  given  by  }T*{y_r.  \ 0r.).  Recall  that  the  dimension  of  Yr.  is  pr - and  that  of 

0r.  is  kT-.  The  mappings  between  the  conditional  distribution  parameters  and  the 
model  parameters,  is  6r.  = T~}  (^hr.(x]  which  were  defined  in  Equation 

3.4,  and  still  hold.  The  dimension  of  /3r<  is  lr*. 

For  Yr.  given  x we  have  Fr.  = (Tr-i,  Tr-2,  ...,  Yr.Pr.)',  where  each  uni- 
variate variable  which  makes  up  Y_r.  is  categorized  just  as  those  described  in 
the  all  univariate  case.  The  number  of  categories  of  each  variable  may  be  given 
in  vector  notation  as  Cr.  = (Cr. i,  Cr- 2,  ...,  Cr*Pr.)'.  For  each  Yr-S  define  in- 
tervals {lr’sjrmai  s = 1,  2,  ...,  £>r» , and  yr*s  = 1,  2,  ...,  CV*S.  Denote 

the  interval  defining  the  jr-s th  category  for  variable  s of  Y_r.  as  Ir-sjr.s  and  the 
length (/r*Sjr.J  by  Lr-Sjr.a.  For  each  yr.,  the  intersection  of  these  intervals  define  a 
region  which  can  be  thought  of  as  a multidimensional  interval.  The  region  defining 


92 


the  jr'ijr‘2---jr*pr*  th  multidimensional  interval  is 


Wr'ljV.J  X (^r*2jr.2)  «r*2. jr.2)  X •••  X ^lT'Pr.jr.p^  , ur'pr.jr.p,^) 

where  = jr*ijr-2—jr-pT.  and  “x”  is  defined  to  be  the  Cartesian  product  as 
found,  for  example,  in  Khuri  [34,  p.  3],  Alternatively,  we  may  write  = 

— r* 

Xjl*!  ?Vsjr.J-  To  indicate  the  probability  that  the  outcome  manifest  vector 

of  interest  falls  into  a certain  region,  we  write  P{Y_r.  £ FC-j  J.  Note,  again, 
that  the  intervals  need  not  be  of  equal  length,  either  between  or  within  the  sets  of 
continuous  manifest  variables.  Calculating  the  cell  probabilities  for  Y_r.  is  done  in  a 
similar  manner  as  that  described  earlier  for  a univariate  continuous  variable  accept 
multivariate  integration  is  performed.  We  have 


PQLr-  € Rr-j  . I X G ljx,  B) 


PiXr- 

£ Rr-j  , X 

— r* 

e h*  1 £.) 

P(x  € Ij.  1 

L) 

U-t. 

Ljx  ^r*(yr- 

, x I er.x)dxdyr. 

4,  Rx  i 

L)dx 

-Rr-ir 

fr-(yr.,  Xjx 

* 

1 dr.x)Ljxdyr . 

/fox  I <L)Lu 


f /r*(yr. 


X = Xjx,  Pr.)dyr„ 


for  Ijx  of  small  length,  dy^  corresponding  to  the  multidimensional  integration  over 
Rr'j  = Jr  fr  f,  , and,  where  0r.T  are  the  parameters 

of  the  joint  distribution  of  Y_T.  and  x , which  are  unknown.  Again,  it  is  in  this 
integration  step  where  the  substitution  for  the  conditional  distribution  parameters, 
defined  by  9r.  = T (hr.(x;  take  place  to  allow  the  model  parameters,  Br.i 

to  enter  the  nonlinear  functions  defining  the  cell  probabilities  as  function  of  f3  . 

For  the  remaining  p — 1 univariate  manifest  variables  (minus  one  because 
the  outcome  of  main  interest  is  accounted  for  by  r*)  order  them,  without  loss  of 


93 


generality,  so  that  Yr,  r = 1,  2,  p ',  are  the  continuous  variables,  Yr,  r = p'  + 

1,  p'  + 2,  ...,  p",  are  the  “other”  discrete  variables,  and  Yr , r = p"  + l,  p"  + 2,  ...p  — 1, 
are  the  multinomial  categorical  variables.  Those  remaining  manifest  variables 
which  require  categorization  are  categorized  just  as  described  above,  for  the  all 
univariate  case.  The  goal  in  the  estimation  procedure  is  still  to  arrive  at  a Latent 
Class  Model  version  of  the  joint  marginal  distribution  of  the  manifest  variables, 
given  in  Equation  4.2.  After  arriving  at  such  an  analog  equation,  we  apply  EGNLS 
to  estimate  the  parameters  of  the  model  n = n(/3)  + e,  i.e.,  Equation  4.4.  We  have 
as  the  probability  of  falling  into  cell  j r,jij2---jP~i  of  the  multinomial  distribution: 


^irJih...jP-A§)  = ^2  / fr-(yr.  \ x = Xj„  Pr.)dyr. 

jx  Rrmir. 

fr{yr  | X = xjx,  pr)dyr 

rJr 

Pr(yr\x  = Xjx,  (3r) 

VrElrjr 

P-1 

II  P(Yr  =ir\x  = Xjx,  ,0r)f(xjx  | 0x)Ljx, 

r=p"+ 1 

where  I(r  < p')  and  I(p'  + 1 < r < p")  are  indicator  variables.  As  in  the 
univariate  case,  these  probabilities  are  concatenated  together  and  used,  along 
with  the  sample  proportions,  to  form  the  model  w = n(0)  + e,  which  is  the 
starting  point  of  the  EGNLS  procedure.  The  dimension  of  k(0)  is  now  N = 

(CV-i  x Cr* 2 x ...  x CT-Pr, ) rir=x  CV-  Recall,  however,  that  the  last  element  is  dropped 
to  avoid  the  singularity  of  V in  the  estimation  process.  The  sample  proportions, 

7?,  are  maximum  likelihood  estimators  for  the  vector  of  the  multinomial  cell 
probabilities  given  by,  in  this  example,  P(Yr.  € Rr-j  ,,  Yx  € Iiji:  ...,  Yp»  € 


V 

n 

r=p'+l 


94 


Ip"jp„i  Yp"+ 1 = jP"+ 1,  •••,  Yp- 1 = jp- i),  for  all  possible  combinations  of  cross- 
classifications.  The  estimation  procedure  is  essentially  the  same  as  the  univariate 
case,  from  this  point. 

Note  that  the  univariate  methods  in  this  chapter  can  be  extended  to  the 
general  multivariate  case,  where  each  manifest  response  variable  may  be  a vector. 
The  notation,  however,  is  messy  when  the  elements  of  Y_r  are  allowed  to  be  any  mix 
of  continuous,  discrete,  and  categorical  variables. 


In  this  section,  we  examine  the  asymptotic  properties  and,  therefore,  develop  a 
means  of  inference  for  the  estimators  derived  for  the  GSME  model.  The  following 
results  hold  for  both  examples  previously  given,  i.e.,  the  all  univariate  case  and 
the  multivariate  regression  case,  as  well  as  for  the  model  in  its  full  generality.  This 
is  because  the  theorems  and  their  proofs  do  not  depend  on  specifications  of  the 
different  models.  Only  minor  details  which  have  no  effect  on  the  results,  such 
as  the  value  of  N (which  is  the  dimension  of  7r) , would  change  depending  on  the 
model. 

Recall  that  the  main  idea  is  to  estimate  the  model  parameters,  /3,  by  minimiz- 
ing the  quadratic  form,  Q(0,  V),  given  in  Equation  4.5.  This  is  done  by  setting 
Q(§_,  V)  equal  to  zero  and  solving  for  /3,  which  is  equivalent  to  solving  the 
estimating  equations  given  in  Equation  4.7  for  f3,  i.e.,  finding  the  solution  to 


The  method  of  EGNLS  requires  using  consistent  initial  estimates  to  obtain 
favorable  asymptotic  properties.  In  the  next  section  we  discuss  deriving  such 
consistent  estimates  for  use  in  the  initial  step  of  the  EGNLS  procedure.  It  is  worth 


4.3  Asymptotic  Properties 


where  V is  a positive  definite  matrix  which  is  consistent  for  V.  Recall  we  estab- 
lished these  estimating  equations  by  letting  S_n{/3)  = rQ(/3 , V)\  /(— 2 n). 


95 


noting  that  tt(0)  does  not  depend  on  x , or  rather  Xjx,  the  midpoints  of  the  intervals 
after  categorization,  since  in  deriving  the  joint  distribution  of  the  categorical 
Y_ i,  Y_2,  ....  Y_p,  summation  over  all  Xjx  it  performed. 

In  order  to  arrive  at  a lemma  which  will  be  used  in  developing  the  asymptotic 
properties  of  the  EGNLS  estimators,  we  must  consider  the  derivative,  with  respect 
to  of  the  left  hand  side  of  Equation  4.7,  which  is  proportionately  equivalent 
to  the  transpose  of  the  second  derivative  of  the  quadratic  form  Q(f3,  V ) with 
respect  to  /3,  given  earlier  in  this  chapter.  Initially,  we  must  differentiate  Sn(/3) 
with  respect  to  the  individual  elements  of  the  vector  of  parameters  /3.  For  ease 
of  notation,  in  this  section  only,  we  shall  define  these  individual  elements  a s /3j, 
i = 1,2,  ...,  v , where,  again,  the  last  kx  of  the  /%  are  representations  of  the  elements 
of  9^..  The  result  which  was  established  earlier  in  this  chapter  for  the  derivative 
of  multiply  compounded,  multivariable  functions  will  be  utilized  throughout  this 
section.  Recall  the  (N  — 1)  x v matrix  is  defined  using  that  result,  giving  as 
its  (j,  z)th  element 


will  result  in  column  vectors  which  when  concatenated  side  by  side  form  the  u x u 


Differentiating  the  left  hand  side  of  Equation  4.7  with  respect  to  /%,  i.e. 


matrix  ^rS_n((3).  Applying  the  product  rule  for  matrices  [49,  p.  329]  gives 


(4.11) 


96 


Upon  concatenation  of  all  the  resulting  column  vectors  from  the  first  piece  on  the 
right  hand  side  of  Equation  4.11  to  form  the  columns  of  a matrix,  and  similarly, 
concatenating  the  resulting  vectors  from  the  second  piece  on  the  right  hand  side 
of  Equation  4.11  into  one  matrix,  we  have  as  the  derivative  to  the  left  side  of 
Equation  4.7: 


and  7 Tj(fl)  and  7 tj  are,  respectively,  the  j th  elements  of  the  vectors  7 r(j8)  and  7?. 
Further,  z?-7  is  the  (i,  j) th  element  of  the  matrix  V-1,  and  j3a  and  @b  are  the  ath 
and  6th  elements  of  /3. 

Now  we  must  introduce  some  notation  and  definitions.  Following  Foutz 
[21],  define  ||M||  to  be  the  norm  of  an  r x r dimensional  matrix  M which  is  the 
least  upper  bound  (lub)  of  all  numbers  \Mz\,  where  z ranges  over  all  vectors  in 
Euclidean  r space,  ET,  such  that  \z\  < 1,  and  define  |z|  to  be  the  Euclidean  norm, 
or  length,  of  a vector  which  is  equal  to  (z'z)1/2  = (]T[=1  z?)1/2.  Further,  if  z is  a 
scalar,  then  \z\  is  the  absolute  value  of  z.  It  is  worth  noting  for  future  use  that  the 
norm,  Euclidean  norm,  and  absolute  value  have  similar  properties.  For  example, 
the  triangle  inequality  holds  for  all  three  (see  [21,  49,  p.  158] ) . Throughout  this 
section,  we  also  make  use  of  many  known  large  sample  theory  results.  For  example, 
the  concepts  of  order  in  probability,  op(.),  and  bounded  in  probability,  Op(.),  are 
used  and  known  properties  of  these  large  sample  concepts  may  be  applied  without 
restating  the  specific  result.  As  a specific  example,  later  in  this  section  we  use  the 


A 

gp&.(&  = S'n(§)  = -un(0)  + K(£), 


where 


the  (a,  6)th  element  of  Rn(/3)  is 


97 


property  that  if  An  = Op(fn)  and  Bn  = Op(gn),  then  An  + Bn  = Op  (ma x(/n,  gn )), 
without  specifically  stating  this  large  sample  result  prior  to  its  use.  An  informative, 
general  summary  of  these  concepts  and  their  results  may  be  found  in  Chapter  5 of 
Fuller  [22].  Agresti  [1]  provides  a review  of  some  large  sample  results  specifically  for 
multinomial  data  in  Chapter  12  of  his  book  on  categorical  data  analysis. 

We  now  state  and  prove  a lemma  used  to  establish  the  asymptotic  properties 
of  the  EGNLS  estimator  for  the  GSME  model. 

LEMMA  2:  Let  (3°  and  7r°  denote  the  true  values  of  f3  and  7 r(/J).  Assume 
that  in  an  open  neighborhood  of  /3°,  defined  by  Ds  = {/? : |/3  - /3°\  < J},  the 
following  regularity  conditions  hold,  for  all  /3  e D5: 

1.  7 Tj(/3)  > 0,  j = 1,  2,  ...,  N , where  the  calculation  of  N may  vary  in  different 
applications.  For  example,  we  saw  in  the  univariate  case  that  N = rir=l  CT, 
whereas  in  the  more  general  case  shown  for  multivariate  regression  of  the 
outcome  vector  of  interest,  N = ( Cr*x  x C'r*2  x ...  x Cr-Pr.)  rir=i  CV, 

2.  the  second  order  partial  derivatives  of  7r(/5)  exist  and  are  continuous, 

3.  the  Jacobian  matrix  (dn(f3)/dfi)  has  full  column  rank  v = (/i+/2+---+^p+^i), 
where,  recall,  lr  is  the  dimension  of  (3  , for  each  r,  and  kx  is  the  dimension  of 

— T 

L- 

Then 

Pff 

(A)  Sn(f3°)  — > 0,  as  n ->  oo, 

Pp  0 

(B)  Un(/3°)  — > U((3°)  and  U((3)  is  positive  definite  for  all  (3  € D$  and  is 


continuous  at  /3°,  and 


98 


Pq  o 

(C)  sup^  || S'n(P)  + U(§) ||  — * 0,  as  n oo; 

Proof:  (A)  We  have  that  S_n(P°)  = ^ \ ^ ^ T_1  (z?  — 2l(^°))>  where  it  is 
known  that 

Pf3° 

7L  — > 7L{P0), 


as  n — > oo.  Further,  due  to  regularity  condition  2,  the  elements  of  - jp-  exist  and 

o / iO0\ 

are  continuous  and  bounded  in  D, 5,  therefore  the  elements  of  ■ ■ are  bounded. 

Moreover,  V~l  converges  to  V~l  in  probability,  as  n -»  00,  because  V converges 
in  probability  to  V and  the  elements  of  P-1  are  continuous  functions  of  the 
elements  of  V . Since  V~l  is  a positive  definite  matrix,  P-1  must  be  bounded,  with 
probability  going  to  one,  as  n 00,  for  all  n greater  than  some  sufficiently  large 

Pffl 

N0-  Therefore,  S_n(P°)  — > 0,  as  n — > 00. 


(B)  We  have  that 


Uni?) 


dp0'  ) v dp0,  ) ’ 


where  does  not  depend  on  n and  has  elements  which  are  bounded  (by 

condition  2)  and  T-1  converges  to  P-1  in  probability,  as  n — > 00,  as  shown  in  the 
proof  of  (A).  Therefore 

Pgo 

Un(P°)  — > U(P°), 


99 


where 


l 30 


Further,  we  know  that  V 1 is  positive  definite,  thus,  for  all  z ^ 0, 


i [UW)\  z = z! 


0z(P)\  ( On{0)\ 

00  ) \ 00  ) 


z > 0. 


Therefore,  U(0)  is  positive  definite  for  all  /3  in  a neighborhood  of  0,  i.e.,  for  all 
0 € Dg. 

We  know  dl^p  does  not  depend  on  n and,  by  condition  2,  it  exists  and  is 
continuous  for  all  0 G Dg,  therefore  it  must  be  continuous  at  0,  where  obviously 
0 G Dg.  Furthermore,  since  V~l  does  not  depend  on  0,  we  have  that  U(0)  is 
continuous  at  0. 

(C)  We  know  that  S'n{0)  = —Un(0)  + Rn{§)  so  we  may  write 


S'n(0)  + U{0)  = -Un(0)  + U(0)  + Rn(0) 


where  -Un(0)  + U{0)  = op(l)  for  all  0 G Dg  = {0 : \0  - 0\  < 5},  because 


-Un(0)  + U{0) 


d7L(0)\  - ! / On(0) 
00  ) ^ 00 


-tz-i  + y-1 


and  — V~l  + V~l  — op(l),  as  shown  in  the  proof  of  (A).  Thus  —Un{0)  + U(/3)  = 
op(l),  for  all  0 G Dg,  and  we  may  conclude  that  sup^e£)i  ||—  Un{0)  + t/ (/3) ||  = op(l) 
because  the  norm  is  a continuous  function  of  the  elements  of  —Un(0)  + U(/3)  and 
the  norm  of  —Un(0)  + U(0)  being  op(l)  for  all  0 G Dg,  implies  the  norm  must  have 
the  same  order  for  the  sup  over  all  P E Dg. 


100 


Therefore,  the  probability  limit  of  swppeDs  ||5^(/9)  + U(/3)\\  is  zero  if  the 
probability  limit  of  sup^^  ||JRTl(/3)||  is  zero  because 


sup  \\S'n(0)  + U(§)\\  = 


sup  ||-tf„(£)  + tf(£)  + i?n(£)|| 

0eDs 

sup  \\-Un{§)  + U(P)\\  + sup  \\(Rn(§))\\ 

peDs  peDs 

Op(  1)  + sup  ||  Rn(P)  ||  . 

PeDs 


Ppo 


Hence,  it  remains  to  show  that  sup/3eDs  |jJRn(/3)||  — > 0,  as  n -*  oo,  in 


order  to  prove  (C).  Recall  that  one  element  of  Rn(/3)  takes  the  form  ~ 

PpO 

7T j {§) ) [ ddga'apl ) ■ We  know  that  nj  — > 7Tj(/?°),  as  n — > oo,  for  all  j.  Thus  for  any 

positive  8'  and  e there  exists  an  N0  such  that  for  all  n > N0, 

P ( | ttj  — 7Tj(^°)|  > e/2)  < 8'.  (4-12) 

Let  8'  be  such  that  \i Tj(fi)  — 7Tj(/?°)|  < e/2  when  |/9  — /3°\  < 8'.  We  know  such  a 8' 
exists  for  any  e > 0 because  7r(/?)  is  continuous  on  D$ . Then  by  Equation  4.12,  for 
all  n > N0: 

P (|%  - HP°)\  > e - Hi)  ~ Hi°)\)  < 

=►  p (ft  - tK/?) ! + Hi0)  - nj(§)\ >e)  <s' 

=►  p (I  ft  - T/'(/f ))  + fait?)  - *#(£))  | > e)  < & 

=►  p (ft  - H§)\  > £)  <s'- 

So,  for  each  element  of  Rn{(3),  )CiXqft  ~ ^jiP))  we  have  that 

It?  — ^jii)  | converges  in  probability  to  zero,  v**  is  an  elements  of  V~x  which 
was  shown  in  the  proof  of  (A)  to  be  bounded  (with  probability  going  to  one  as 


101 


n — » oo),  and  by  condition  2 are  bounded  for  all  i and  /3  € Dg.  Therefore, 


PpO 

— y 0,  for  all  /3  G D$,  and  thus  the  sup  over  all  /3  £ Dg  must  converge  in 


probability  to  zero.  This  completes  the  proof.  <0> 

Theorem  3,  and  its  proof,  are  based  on  similar  ones  found  in  Shah  [52,  p.  39] 
for  situations  involving  diagnostic  tests,  but  adapted  here,  for  the  GSME  model. 
The  theorem  states  the  existence  and,  of  greater  importance  for  use  with  the 
theorems  that  follow,  the  uniqueness  of  a weakly  consistent  sequence  of  solutions 
to  Equation  4.7  with  limiting  probability  one.  The  key  to  the  uniqueness  of  the 
consistent  estimator  is  that  any  other  consistent  estimator  that  solves  Equation 
4.7  is  equal  to  it,  with  probability  converging  to  one,  and  therefore  must  have 
the  same  asymptotic  distribution.  The  proof  utilizes  a version  of  the  Inverse 
Function  Theorem  ([48,  p.  221];  a simplistic  interpretation  of  this  version  of  the 
Inverse  Function  Theorem  is  that  the  inverse  of  a function  exists  provided  its 
derivative  has  an  inverse  which  exists,  i.e.,  is  positive  or  negative  definite,  at  some 
point  within  the  chosen  neighborhood).  The  theorem  and  proof  by  Shah  for  the 
diagnostic  testing  situation  are  based  on  an  original  theorem  and  proof  by  Foutz 
[21]  concerning  unique  consistent  solutions  to  likelihood  equations. 

THEOREM  3:  Given  the  regularity  conditions  of  Lemma  2,  with  probability 
going  to  one  as  n — >•  oo,  there  exists  a sequence  of  solutions  to  Equation  4.7, 


such  that 


102 


as  n -»  oo.  Further,  if  {$nj  is  any  other  consistent  sequence  of  solutions  to 
Equation  4.7,  then 

PP°(Kn  ^ 0 

as  n —¥  oo. 

Proof:  By  result  (B)  of  Lemma  2,  we  know  U(/3°)  is  positive  definite.  There- 
fore, we  may  define  A = 1/(4  ||[/(/3°)_1||).  Further,  we  know  that  S'n(/3°)  is 
arbitrarily  close  to  a negative  definite  matrix.  This  can  be  seen  easily  since 
S'n(f3°)  = — Un(fP)  + Ra(P°):  where  by  result  (B)  of  Lemma  2,  f/n(/3°)  converges 
in  probability  to  U((3°)  which  is  positive  definite  and  Rn((3° ) = Op(n-1/2)  because 
7 Tj  — 7r j(P°)  = Op(n-1/2),  for  all  j , exists  and  is  continuous  by  regularity  con- 

dition 2,  and  v is  finite,  for  all  i,  j,  because  it  was  shown  in  the  proof  of  Lemma  2 
that  the  elements  of  V -1  are  bounded.  Therefore,  define  A„  = 1/  (4  ||S^(/30)-1 1|), 
whenever  S'n(f3°)  is  negative  definite.  Using  result  (C)  of  Lemma  2,  then,  we  have 
An  — > A in  probability. 

Also  by  result  (B),  we  have  that  U(/3)  is  continuous  at  (3°  and  one  can  then 
choose  a S > 0 sufficiently  small  so  that 


||£7(2)-y(2°)||  <A/2 


and,  by  part  (C)  of  Lemma  2, 


lim  Pp o sup  || S'n {(3)  + U(0)\\  > A/4  = 0 

n^oo  a \ieDs  ~ J 

whenever  \/3  — /3°\  < 6. 

In  the  following  steps,  S'n((3 ) is  shown  to  be  arbitrarily  close  to  S'n((3° ) which 


was  shown  above  to  be  arbitrarily  close  to  a negative  definite  matrix.  This  will 
then  allow  us  to  apply  the  Inverse  Function  Theorem,  with  probability  tending  to 


103 


one.  So,  for  /?  G D$,  i.e.,  \/3  — /3°j  <5,  we  have 

< ll+ffl  + u(§) |[  + ||-s;(g°)  - y(^°)||  + 1| -u{§)  + c/(g°)|| 

< (A/4) + (A/4) + (A/2) 

= A, 

with  probability  going  to  one  as  n — >■  oo.  Further,  because  An  — > A in  probability, 
we  know 

p2.  (ip;©  - s;(g°)||  > 2A„)  = i>(||s;(©-5;(^°)||>2A„.A„>ll 

+Pt  (||SUg)  - S;(g°)||  > 2 An,  A„  < 

< Pt  (||$,(£)  - 5(,(/3°)||  > 2A)  + Pg.  (a„  < A 
— y 0 + 0 = 0, 


as  n — »•  oo. 

Now  we  may  apply  the  appropriate  version  of  the  Inverse  Function  Theorem 
[48,  p.  221]  to  insure,  with  probability  approaching  one,  that  S_n  is  a one-to-one 
function  from  D$  onto  S_n{Dg)  and  S_n(D$),  the  image  set,  contains  the  open 
neighborhood  of  radius  XnS/2  about  Sn(/3°).  Since  \n6/2  -»  XS/2  in  probability, 
there  is  also  an  open  neighborhood  of  radius  X6/2  around  Sn(/3°),  with  probability 
going  to  one,  in  the  image  set  Sn(Ds). 

Further,  by  (A)  from  Lemma  2,  with  probability  going  to  one  as  n — > oo,  we 
know  that  ||5„(/3°)  — 0||  < A«5/2,  and  thus  0 G S_n(Ds),  with  probability  going  to 
one  as  n -+  oo. 

Let  us  now  consider  the  inverse  function  S'1  : Sn(D§)  -+  D$,  which  is  well 
defined  whenever  S_n  is  one-to-one  and  onto  with  probability  going  to  one.  Finally, 
since  0 G S_n(Ds),  with  probability  going  to  one  as  n — > oo,  we  may  conclude: 


104 


1.  that  the  root,  S^O),  of  Equation  4.7  exists  in  Ds,  with  probability  going  to 
one  as  n — > oo;  this  is  due  to  the  fact  that  5“x  exists  and  is  well  defined  on 
S_n(Ds)  and  therefore  may  be  applied  to  0,  which  is  shown  to  exist  in  S_n(Ds), 
the  image  set. 

2.  S-\0)  converges  in  probability  to  (3°,  since  <5  may  be  taken  to  be  arbitrarily 
small;  this  is  because  it  was  just  shown  in  Part  1 that,  with  probability  going 
to  one  as  n — > oo,  5“x(0)  exists  in  D$  and  since  5 can  be  arbitrarily  small, 

can  get  arbitrarily  close  to  the  center  of  Ds  which  is  /3°. 

3.  by  the  one-to-oneness  of  5n  on  Ds , any  other  sequence  j/?n  j of  solutions  to 
Equation  4.7  necessarily  must  lie  outside  of  Ds  with  probability  going  to  one, 
i.e.,  j/3n  j does  not  converge  to  f3°.  This  is  easily  seen  since  if  /3n  G Ds  is  a 
solution  to  Equation  4.7,  then  Sn(/3J  = 0 and  this  implies  that  /3  = 5“1(0), 
as  it  has  been  established  that  S_~x  exists  on  Sn(Ds).  However,  if  e Ds 

is  also  a solution  to  Equation  4.7,  then  we  have  S_n(Pn)  = Q which  implies 
§_n  — iSjj 1 (2)  • The  fact  that  S_n  is  a one-to-one  function  on  Ds  means  that  the 
preimages  of  zero  must  be  the  same,  i.e.,  /?n  = with  probability  going  to 
one. 

The  proof  is  complete  upon  taking  0n  = 5~1(0).  0 

The  uniqueness  established  in  Theorem  3 is  of  notable  importance  not  only 
since  any  other  weakly  consistent  estimator  must  be  the  same  (with  probability 
tending  to  one),  but  must  also,  therefore,  have  the  same  asymptotic  properties,  e.g., 
distribution,  as  the  existent  estimator. 

The  following  theorem  shows  that  given  a consistent  initial  estimate,  the 
estimate  derived  from  the  modified  Gauss-Newton  method  used  in  the  EGNLS 
procedure  will  also  be  consistent.  In  addition,  the  theorem  provides  a condition 
which  produces  a consistent  estimate,  even  when  no  consistent  initial  estimate 
exists. 


105 


THEOREM  4:  Let  an  be  a sequence  of  constants  such  that  an  — > 0 as 
n -»  oo.  Suppose  that  (j3T  - /3° ) = Op(an).  Let  /3  denote  the  EGNLS  estimator 
obtained  from  the  modified  Gauss-Newton  algorithm.  Then,  under  the  regularity 
conditions  stated  for  Lemma  2, 


(fi-  d°)  = °p  (max(Gn,  n 1/2))  . 


Further,  if  it  is  known  that  no  local  minima  exist  to  the  quadratic  form  of  Equation 
4.5,  then  the  resulting  EGNLS  estimator  from  the  modified  Gauss-Newton  algo- 
rithm is  such  that  ^/3  — /?°j  = Ov  (n-1/2),  regardless  of  starting  with  a consistent 
estimator  or  not. 

Proof:  The  proof  for  the  first  result  of  the  theorem  follows  directly  that  of 
the  analog  for  a similar  theorem  in  a Ph.D.  dissertation  by  Shah  [52,  p.  31]  for 
binary  variables  in  diagnostic  tests.  First,  consider  the  one  step  estimator  j3k  of  the 
Gauss-Newton  procedure  defined  in  Equation  4.8,  having  /3r  as  an  initial  estimator 
and  /3°  denoting  the  true  parameter  value. 


Note  that  7?  - 7r(/3°)  = Op(n~l/ 2)  and  7 r(/?T)  = 7 r(/3°)  -I-  Op(an ) since 
(0T  ~ P°)  = Op(an ) and  by  the  continuity  of  the  vector  7 r.  Therefore,  n - 7r(/3  ) 
is  Op  (max(a„,  n_1/2)).  Also,  we  know  V is  consistent  for  V and  the  inverse, 
is  a continuous  function  of  the  elements  of  V.  Further,  the  matrix  is 
continuous  in  /3  (by  regularity  condition  2 of  Lemma  2),  so  the  term 

-1  -1 


%(/y 


a/3' 


V 


-1 


a/3' 


dlL(PT) 


a/3' 


V- 


V 


-1 


9k(£) 

a/3' 


V-\ 


converges  to 


106 


in  probability,  which  is  bounded  and  does  not  depend  on  n.  Thus,  from  Equation 
4.8  we  may  write 

fiM  = P.T  + °P  (maXK’  n_1/2))  > 

and  further, 

0M  - @°)  = dr  ~ £°)  + °p  (maxK>  n_1/2)) 

= Op  (max(an,  n~1/2))  . 

Now,  since  each  newly  iterated  estimate  also  satisfies  the  same  property  by  similar 
arguments,  the  first  part  of  the  proof  is  complete. 

The  second  result  of  the  theorem  can  trivially  be  seen  by  understanding 
that  assuming  only  a global  minimum  (to  Equation  4.5,  i.e.,  the  quadratic  form, 
Q(/3,  V))  exists,  then  no  matter  where  the  modified  Gauss-Newton  procedure 
begins,  the  process  must  converge  to  that  global  minimum.  This  is  due  to  the 
fact  that  the  modified  Gauss-Newton  procedure  guarantees  convergence  of  the 
algorithm.  0 

The  key  result  from  Theorem  4 is  that  the  estimator  derived  using  the  EGNLS 
method  to  estimate  the  parameters  of  the  GSME  model  provides  consistent 
estimates  assuming  a consistent  initial  estimate  exists.  A consistent  initial  estimate 
for  the  EGNLS  procedure,  i.e.,  for  use  in  the  one-step  estimation  Equation  4.8, 
is  guaranteed,  accept  on  a set  of  probability  zero,  since  Theorem  3 states  the 
existence  of  a consistent  estimator,  with  probability  tending  to  one  as  n — > oo, 
which  is  a solution  to  Equation  4.7.  A result  of  greater  importance  is  that  since 
the  EGNLS  estimator  is  consistent,  the  uniqueness  proven  in  Theorem  3 implies 
the  consistent  solution  to  the  EGNLS  procedure  must  have  the  same  asymptotic 
properties  as  the  weakly  consistent  estimate  that  solves  Equation  4.7. 


107 


The  second  result  of  Theorem  4 states  a consistent  estimate  from  the  modified 
Gauss-Newton  procedure  can  be  reached  even  if  one  does  not  start  with  a consis- 
tent initial  estimate,  as  long  as  it  is  known  that  no  local  minima  to  Equation  4.5, 
i.e.,  Q(P,  V),  exist.  As  stated  earlier  in  the  chapter,  it  is  known  that  the  modified 
Gauss-Newton  procedure  will  converge  [31,  p.  269],  which  is  why  it  is  preferred 
here  over  the  standard  Gauss-Newton  procedure,  where  there  may  be  no  guarantee 
of  convergence  (see,  for  example,  [31,  64]).  Therefore,  if  only  a global  minimum 
exists,  the  modified  Gauss-Newton  procedure  is  guaranteed  to  find  it. 

This  part  of  the  theorem  provides  yet  another  strength  to  the  methods 
developed  for  the  GSME  model.  As  often  is  the  case  in  real  world  situations,  one 
may  never  be  certain  if  some  chosen  starting  values  are  consistent  for  the  values 
of  the  unknown  population  parameters.  Therefore,  in  practice,  if  the  existence 
of  a consistent  initial  estimate  is  in  question,  the  EGNLS  procedure  can  be 
performed  with  multiple  starting  values  from  a wide  grid,  and  if  those  different, 
not  necessarily  consistent,  starting  values  result  in  the  same  estimate,  it  can 
rather  safely  be  assumed  that  there  are  no  local  minima  and  that  all  of  the  initial 
estimates  used  for  the  EGNLS  procedure  converged  to  a single  global  minimum. 
Note  that  if  local  minima  exist,  then  the  same  conclusion  cannot  be  drawn.  The 
existence  of  local  minima  would  contradict  the  uniqueness  result  from  Theorem  3, 
therefore  the  resulting  estimators  would  differ  (in  probability)  and  thus  they  would 
follow  different  asymptotic  distributions  and  not  be  applicable  to  the  results  given 
here. 

In  the  next  and  final  section  in  this  chapter,  we  discuss  how  consistent  initial 
estimates  may  be  obtained  under  the  GSME  model.  Since  these  initial  estimates 
are  based  on  sample  proportions,  ZL,  where  7 r - 7t(/3)  is  Op(n_1/2),  using  these 
estimates  as  starting  values,  the  modified  Gauss-Newton  procedure  will  produce  an 
estimator,  /?,  such  that  (jd  — /3 = Op(n-1/2). 


108 


The  following  theorem  states  the  large  sample  distributional  properties  of 
solutions  to  the  estimating  equations  of  Equation  4.7  and,  thus,  are  also  the 
properties  of  the  EGNLS  estimator.  Again,  the  theorem  and  proof  are  based  on 
those  by  Shah  [52,  p.  43]  for  diagnostic  tests,  but  adapted  here  for  the  GSME 
model. 

THEOREM  5:  Assume  that  the  regularity  conditions  of  Lemma  2 hold 
and  let  j/?nj  be  a weakly  consistent  sequence  of  estimators  that,  with  probability 
tending  to  one  as  n — » oo,  satisfy  Equation  4.7  and  0 — 0°  — Op(n-1/2).  Then 

Lpo 

yft  (ln  ~ 0°)  — > MVNv  (0,  Utf0)-1) , 


where  recall  that  u is  the  dimension  of  0°. 

Proof:  Given  e > 0,  Theorem  3 implies  that,  with  probability  going  to  one  as 


n — »•  oo,  there  exists  an  n0  such  that  for  n > n0,  |/?n  j € D$  and  |^n|  satisfies 
Equation  4.7  with  lim^oo  P (^Sn(0n)  = oj  = 1.  Now,  using  a Taylor  Series  (see,  for 
example,  [50,  p.  17])  expand  the  vector  S_n0n)  about  0°,  and  thus  for  j/3n}  € Ds, 
we  have 


o = mj 

= &,(t)  + S'nW°)  (f„  - P°)  + Op(n\ 
with  probability  tending  to  one  as  n -»  oo.  Therefore, 


Pffl 


lim 

n—toc 


n 1/2S„(2°)  + «1/2s;(g°)  (f„  - g°) 


= 0 


= 1. 


109 


By  results  (B)  and  (C)  of  Lemma  2 

Ppo 

S'n(§°)  — > -U(fP), 


which  implies 


P gO 


lim 

n—>oo 


n1,2Sn(P°)  + n1'2  [-t/^0)]  (£ 


= 1, 


and,  thus, 


Pp 0 


lim 

.n— > oo 


n 


1/2 


[-wv]  w/2  (a,  - 2°) 


= l. 


Therefore,  we  know  that,  with  probability  going  to  one  a n — » oo,  the  limiting 
distribution  of  n1/2  - /30^  is  the  same  as  that  of  n1/2  [— L/(/3°)_1]  5n(/3°).  Using 

the  Multivariate  Central  Limit  Theorem  [45,  p.  128]  and  the  Delta  Method  (see, 
for  example,  [1,  p.  56-58])  yields 


Lpo 

n1/25n(£°)  _>  MVNP( 0,  U(§°)). 


The  result,  then,  follows.  0 

Note  that  after  the  last  step  in  the  modified  Gauss-Newton  algorithm,  the 
asymptotic  variance-covariance  matrix  of  the  EGNLS  estimates  can  be  directly  es- 
timated. Let  (3_  represent  the  EGNLS  estimators,  then  the  approximate  asymptotic 
variance-covariance  matrix  of  /3  after  the  last  step  of  the  Gauss-Newton  algorithm 
is  estimated  by 

V W J { dP  ) ■ 


110 


It  was  shown  by  Shah  [52,  p.  44]  that  for  diagnostic  tests  with  binary  vari- 
ables, that  the  EGNLS  estimators  are  asymptotically  efficient  when  compared  to 
the  maximum  likelihood  estimators.  We  cannot  arrive  at  that  same  result  here.  In 
applying  EGNLS  to  the  GSME  model,  we  know  that  efficiency  has  been  lost  due  to 
the  categorization  of  manifest  variables  and  categorization  of  x.  We  shall  discuss 
possible  future  work  concerning  the  asymptotic  efficiency  of  the  estimators  for  the 
GSME  model  in  the  final  chapter  of  this  dissertation. 

4.4  Consistent  Initial  Estimates 

In  the  previous  section,  theorems  were  stated  and  proven  which  gave  the 
results  that  the  EGNLS  method  produced  parameter  estimators  which  were 
consistent  and  asymptotically  normal  for  the  GSME  model,  given  that  consistent 
initial  estimates  are  used  in  the  estimation  procedure.  Recall  that  the  vector  of 
model  parameters  is 

£=(&•& £.&)'• 

In  this  section  we  describe  a method  to  derive  consistent  estimates  to  be  used  as 
initial  trial  values,  for  use  in  EGNLS. 

By  taking  a just-identified  subset  of  invertible  equations  from  Equation 
4.4,  i.e.,  7r  = 7l(§)  + £,  one  can  solve  for  the  elements  of  /3  in  terms  of  sample 
proportions,  n,  which  are  consistent  estimates  of  the  true  cell  probabilities,  7 r.  The 
dimension  of  n is  N - 1,  so  as  long  as  the  number  of  elements  in  /?,  which  is  equal 
to  v = (li  + Z2  + ...  + lp  + kx),  is  less  than  N we  can  take  a just-identified  subset 
of  equations  and  solve  for  our  initial  estimates  which  will  be  consistent.  Recall  that 
the  calculation  of  N may  vary  for  different  applications. 

This  method  of  using  a just-identified  subset  of  equations  to  derive  initial 
estimates  relates  directly  to  the  use  of  an  invertible  subset  of  equations  used  in 
Section  3.3  of  Chapter  3 on  identifiability.  Theorem  2 of  that  section  establishes 
the  identifiability  of  the  GSME  model  through  the  existence  of  an  invertible  subset 


Ill 


of  equations  from  the  parameters  u(0)  of  the  joint  distribution  of  the  manifest 
variables,  /y(y;  oj(fi)).  Recall  that  Theorem  2 stated  that  /3  is  identified,  and  thus 
the  GSME  model  is  identified,  if  there  exists  a proper  subvector  of  u,  say  uq,  such 
that  = ft.  This  means  that  when  such  an  uq  exists,  we  know  an  invertible, 

proper  subset  of  n does  as  well.  This  is  because  n is  the  vector  of  parameters  of  the 
joint  marginal  distribution  of  the  manifest  variables  (given  by  co  in  Section  3.3  on 
identification  of  the  GSME  model),  after  categorization. 


CHAPTER  5 

APPLICATION  TO  MOTIVATING  EXAMPLE 

In  this  chapter  we  show  how  the  motivating  example  from  the  RERF  in 
Hiroshima,  Japan,  which  was  first  discussed  in  Chapter  1,  can  be  modified  to  fit 
into  the  GSME  model  framework  discussed  in  Chapter  3 and  how  the  parameters 
of  the  associated  models  can  be  estimated  using  the  procedure  described  in 
Chapter  4.  Let  Li,  the  outcome  of  interest,  be  the  time  until  the  diagnosis  of 
leukemia  following  A-bomb  radiation  exposure.  The  variable  contaminated  by 
measurement  error  is  the  amount  of  true  radiation  dosage  an  individual  was 
exposed  to,  it  is  denoted  by  x,  and  it  is  estimated  from  the  DS86  system.  DS86 
stands  for  the  Dosimetry  System  1986  and  it  estimates  an  individuals  true, 
unknown  level  of  radiation  exposure,  where  the  imprecision  in  an  individuals 
estimate  comes  primarily  through  uncertainties  about  survivors’  shielding  and 
location  to  the  specific  bombs’  hypocenter  [42,  p.  275].  The  DS86  estimate  is 
denoted  by  Y2.  The  identifying  information  comes  through  the  instrumental 
variable,  which  is  an  indicator  of  epilation,  and  this  shall  be  denoted  as  Y3. 
Epilation  is  severe  hair  loss  due  to  radiation  exposure  (scalp  hair  loss  of  more  than 
67%  [41])  and  here  a value  of  1 indicates  epilation  and  a value  of  0 indicates  no 
epilation.  We  shall  heuristically  assume  that  the  parameters  for  this  GSME  model 
are  identified.  Further,  for  the  current  application,  we  shall  assume  that  there  is 
no  censored  data.  Showing  how  the  outcome,  Yi,  can  be  represented  by  a GSME 
model  is  not  trivial,  so  we  shall  save  that  task  for  later  in  the  chapter  and  start 
with  the  remaining  variables,  first. 

Our  assumptions  follow  closely  those  used  by  Pierce  et  al.  [43]  which  have 
been  established  by  the  RERF  from  previous  research  [42].  The  distributions 


112 


113 


involved  in  our  assumptions  in  this  chapter  are  discussed  in  more  detail  by 
Kalbfleisch  and  Prentice  [33].  We  assume  that  x follows  a Weibull  distribu- 
tion where  its  units  are  measured  in  grays  of  radiation.  This  is  denoted  as 
x ~ Weibull^),  where  9^  = (a,  b)  and  f(x  | 9^)  = abxb~l  exp(— axb). 
Estimated  Radiation  dose,  Y 2.  is  assumed  to  follow  a log-normal  distribution. 
Therefore,  we  have  log(Y2)  ~ N(p2l  erf),  where  p2  = log(x)  = h2 i(x;  /521),  in  the 
GSME  model  notation  from  Equation  3.4.  That  is,  log(Y2)  = log(x)  + u,  where 
u ~ N(0,  erf).  Then,  conditional  on  x,  the  log-normal  density  for  Y2  is  written  as 
/2(y>  | x,  92)  = (27r)-1/2p2r/2'1exp(-[log(A21y2)]2p|/2),  where  02  = (A21,  p2),  X21  = 
\/x  and  p2  = \jo2.  The  assumptions  for  the  instrumental  variable  are  as  follows: 

Y3  ~ Bernoulli(93),  where  \x3  - h3i(x;  ^ = exp(/d30  + Pzix)/ [1  + exp(^30  + fi31x)\ 
and  the  conditional  distribution  written  in  terms  of  the  conditional  distribution 
parameters  is  /3(F3  | x;  £3)  = 9\3{  1 - 93)l~Vi,  where  9^  = 93. 

Now  that  we  have  seen  how  the  other  manifest  variables  and  x can  be  defined 
by  the  mappings  of  the  GSME  model,  let  us  consider  the  outcome  of  interest;  time 
until  the  diagnosis  of  leukemia.  Let  A {xp,  x)  denote  the  hazard  function  of  Yi,  given 
radiation  dose  x.  That  is,  given  dose  x , 


and  is  interpreted  as  the  rate  of  instantaneous  failure  (e.g.,  diagnosis  of  leukemia) 
after  time  jq  given  that  an  individual  who  received  dose  x is  disease  free  at  y\.  It  is 
well  known  (see  [33,  p.  6])  that 


Therefore,  one  can  specify  A(?/i,  x)  to  define  fi(yi  \ x , 9 x).  Note  that  usually  the 


A(yi,  x)  = lim  Pr(y1  <Yl<yl  + A yx  \Y1>y1,  x)/A yx 
Aj/i— >0 


independent  variable  in  a hazard  function  is  time  and  therefore  denoted  by  T, 


114 


however,  we  retain  Y\  to  avoid  confusion  with  the  T mappings  from  the  equations 
in  3.2  in  Chapter  3. 

Individuals  who  were  in  Hiroshima  or  Nagasaki  during  the  times  that  the 
bombs  were  dropped  could  have  developed  leukemia  which  was  not  induced  by 
the  exposure  to  this  radiation.  It  is  assumed  here  that  the  chances  of  developing  a 
non-radiation  induced  leukemia  exists  at  the  beginning  of  the  study  and  is  constant 
throughout,  never  increasing  or  decreasing.  This  can  be  thought  of  as  a steady, 
baseline  or  background  hazard  rate,  which  is  always  present  in  every  individual. 
This  baseline  hazard  rate  is  denoted  by  A0  and  does  not  depend  on  y\  or  x , and  is 
therefore  constant.  The  fact  that  a non-radiation  induced  leukemia  could  occur  for 
any  individual,  even  though  they  were  still  exposed  to  radiation  from  the  bombs, 
must  be  accounted  for  in  the  hazard  plot.  Therefore,  the  hazard  plot  for  Yi  cannot 
start  with  an  initial  hazard  rate  for  an  individual  of  zero  because  there  is  some 
initial  chance  that  every  individual  in  the  study  could  develop  a non-radiation 
induced  leukemia.  Assume  a plot  of  A(yi,  x)  appears  as  illustrated  in  Figure  5-1. 

We  could,  as  an  example,  specify  A(yi,  x ) as, 

A(j/i,  x)  = \oR\{yi,  x), 

where  A0,  again,  is  the  baseline  hazard  rate  and  R\{yi,  x)  is  the  relative  risk 
(henceforth,  RR)  function  satisfying  Ri(0,  x)  = Ri(yi,  0)  = 1,  and  Ri(oo,  x)  = 1. 
For  example,  Ri{yi,  x)  might  be  Ri(yi,  x)  = exp  [xexp(/3nyi  + /312  logyi)],  /3n  < 0 
and  P12  > 0.  Or,  we  might  specify  A(yi,  x)  to  be 

A(s/i,  x)  = \0  + Rl(yi,  x), 

where  R{(yi,  x)  is  the  attributable  risk  (henceforth,  AR)  function  satisfying 
i?i(0,  x)  — R{(yi,  0)  = 0 and  jRJ(oo,  x)  = 0.  As  an  example,  R$(y x,  x)  might  be 

R*i(yi,  x)  = zexp(/3uyi  +y012 logyi)  = zyf12 exp^nyx), 


> 'A)\ 


115 


yi 


Figure  5-1:  Hazard  Plot  for  Time  Until  Diagnosis  of  Leukemia  After  A-Bomb 
Radiation  Exposure 


116 


where  /3n  < 0 and  0X2  > 0. 

Modeling  Yx  by  using  either  of  these  two  forms  for  it’s  hazard  function, 

A(yi,  x),  would  not  fit  into  the  GSME  model  framework  because  these  parameters 
are  functions  of  both  yx  and  x.  In  the  RERF  study,  the  hazard  function  for  f\(yx  \ 
x,  6X)  cannot  be  modeled  by  a parameter  which  uses  a RR  or  an  AR  function 
because  to  fit  into  the  GSME  model  framework  the  parameter  must  be  a function 
of  x only;  the  unobserved  level  of  radiation  exposure.  This  parameter  is  seen  in 
the  mappings  which  define  the  GSME  model  as  hn(x-  0 ),  a function  of  x (and 
0U)  only.  So,  at  first  glance,  it  might  appear  that  this  problem  is  outside  of  the 
scope  of  the  GSME  model.  It  turns  out  that  some  survival  analysis  problems  may 
be  modified  in  such  a way  as  to  include  a parameter  which  is  a function  of  x alone, 
and  therefore,  in  the  modified  form,  meet  the  requirements  that  a GSME  model 
must  satisfy.  In  this  specific  application,  writing  the  hazard  function  for  Yx  as  the 
sum  of  two  separate  parts  offers  a solution  to  the  problem.  The  hazard  function 
in  this  case  can  be  separated  into  two  independent  components:  one  component 
representing  the  instantaneous  rate  of  failure  (diagnosis  of  leukemia)  based  solely 
on  radiation  exposure  and  the  other  component  specifying  the  instantaneous  rate 
of  failure  based  on  the  baseline,  or  background,  rate  for  developing  a non-radiation 
induced  leukemia.  In  what  follows,  we  show  how  this  example  can  be  modified  to 
fit  into  the  Latent  Class  model  developed  in  Chapter  4,  due  to  the  fact  that  one  of 
these  two  separate  components  can  be  modeled  to  include  a parameter  which  is  a 
function  of  x alone. 

To  show  how  the  RERF  problem  fits  into  the  GSME  model  framework,  recall 
that  Yx  is  the  time  to  diagnosis  of  leukemia  after  exposure  to  radiation.  This 
may  actually  be  modeled  by  combining  two  separate  hazards  for  the  two  types  of 
leukemia  which  may  be  diagnosed:  a radiation  induced  leukemia  or  a non-radiation 
induced  leukemia.  Let  yn  be  the  time  until  the  diagnosis  of  a leukemia  that  was 


117 


due  to  radiation  exposure  and  Yu  be  the  time  until  the  diagnosis  of  a leukemia  in  a 
case  that  was  not  induced  by  radiation  exposure.  Then  Yi  = min(Fn,  y12).  Assume 
that  Yu  and  Y\2  are  independent.  Further,  assume  that  Y12  has  a hazard  function 
denoted  by  A2(yx),  which  is  free  of  x,  and  Yu  has  a hazard  function  denoted  by 
Ax(yx,  x ) with  a shape  illustrated  by  the  top  curve  only,  labeled  “Radiation  induced 
hazard  rate”,  in  Figure  5-1,  with  A x)  = Ax  (yi,  X(x)),  where  Xx  (yx,  A(0))  — 0. 
Then  the  density  function  of  Yxi  given  x has  a parameter  that  is  a function  of  x, 
namely  A(z),  since  AO/ii  | x,  6X)  = Xx (yn,  X(x))  exp  (-  /0yu  Ax(u,  A (x))du). 

Now  the  hazard  function  of  Yx  given  x is 


A(j/i,  x) 


lim  P(y i < Yi  < yi  + Ayi  | Yi  > yi)/Ayi 

Ayi~>0+ 

11111  , [Phi  < Yi  < yi  + Ayi  n Yn  > yi  n Yn  > yi)/P(Yn  > yi,  Yn  > yi)]  /Ayi 

Ayi— *0+ 

lim  [P  {(Yii  < yi  + Ayi  U >12  < yi  + Ayi)  n (Yn  > yi,  Yn  > yi )}  /P{Y\  > yi)Ayi] 

Ay,  — ►O' 

lim  [P  {(yi  < Yii  < yi  + Ayi  n V12  > y 1)  U (yi  < Y12  < yi  + Ayi  n Yn  > yi)}  /P(Yi  > yi)Ayi] 

Ayi— fOd- 

lim  [(P  {yi  < Yu  < yi  + Ayi  n Y12  > yi}  + P {yi  < Yn  < yi  + Ayi  n Vn  > yi}  - 

Ayi->0+  — 


P {y  1 < Vii  < yi  + Ayi  n Y12  > yi  n yi  < Yn  < yi  + Ayi  n Yu  > yi})  /P(Yn  > y\)P(Yn  > yi)Ayi] 
Ai(yi,A(x))  + A2(yi)-  Urn  V-1 +AVi  l^l>Vl)l 

Ay!->0+  . 

P(yi  < V12  < yi  + Ayi  I Yn  > yi) 


Ayi 


Ayi 

Ai(yi,  A(x))  + A2(yi). 


Ayi 


This  implies  that 


Mvi  I £1)  = f{y  11  I x,  0x)S(yi2  I x,  £1)  + /(yx2  I ar,  0x)S(j/xi  I £x)> 

where  5(.)  is  the  survival  function,  e.g., 

/ rym  \ 

S{yu\x,61)=  / f(u)du  = exp  ( — / X2(u)du  1 , 

^ yi2  \ J 0 / 

and  this  can  easily  be  shown  using  the  fact  that 

A (?/i  | a;,  £1)  = A(yx,  x)  exp  X (it,  x)di/j  . 


118 


This  density  is  a function  of  the  parameters  of  f(yu  | x,  9X)  and  f{yX2  \ x,  8X).  By 
our  assumptions,  this  includes  a parameter  A (or). 

Now  f\(y\  | x , 0X)  can  be  specified  by  choosing  any  density  for  Yi2  and  any 
density  for  Yu  with  a hazard  function  of  the  previously  illustrated  shape,  by  setting 
one  parameter  of  f(yn  I x,  8n)  to  A(x)  and  specifying  a functional  form  for  A(:r). 

In  light  of  this  new  way  of  expressing  the  density  for  Y\  to  fit  into  the  GSME 
model  framework,  let  us  return  to  the  RERF  example  specifications.  Assume 
that  Yu  ~ Log  — logistic{ An,  px)  and  that  An  = fix,  where  0 > 0.  Then, 
we  have,  f(yn  | x,  8U)  = XuPi{^uyi)Pl~1  [1  + (An2/i)pl]~2  and  Ax  (yx,  A(z))  = 
(fix)pi  ((/ 3x)yi)Pl~ 1 / [1  + ((/ 3x)yi)pi ],  a hazard  function  having  the  desired  shape. 
Further  assume  that  Yu  ~ exp( Ai2),  i.e.,  is  exponentially  distributed.  Then 
f(y  12  I x,  d12)  — A12  exp(-Ai2?/i)  and  X2(yi)  = Ax2.  So,  combining  these,  we  have 

/i(s/i|*,£i)=^i2  exp(-Ai2j/i)[H-(Aiii/i)pi]_1+Aiipi(Anj(i)Pi-1[l+(Aiij/i)>,i]_2  exp(— A12J/1) 

as  the  density  for  Yx  in  the  GSME  model. 

So,  for  this  specification,  we  have  that  the  distributional  parameters  are 
9X  = (An,  pi,  Ai2)',  82  — (A21,  P2)',  and  83  = /x3  and  therefore  we  know  that  k2  = 2, 
kx  = 3,  and  k3  = 1.  For  the  functions  of  the  distributional  parameters  which  are 
used  with  Equation  3.4,  we  have  that  Tj  is  the  identity  mapping,  the  mapping  T2 
consists  of  two  elements  and  they  are  defined  by  T2i(A2i,  p2)  = — log(A2i)  = g2 
and  T22(A2i,  p2)  = \/p\  = a2  (so  j_2  = (g2,  a2)’),  and  the  mapping  T3  is  also 
the  identity,  i.e.,  T3(03)  = = 83.  To  complete  the  specification  of  parameters 

found  in  Equation  3.4,  let  hu(x ; / 3u ) = An  = 0x,  hx2(x;  0x2)  = pu  and 
^13 (®;  Pis)  = Ax2,  for  Yu  h2i(x;  foi)  = P2  - log(x)  and  h22{x\  022)  = cr|,  for  Y2,  and 
hz\(x\  0^)  = 83  = g3  = exp (03O  + 03 xx)/  [1  + exp(/330  + fcix)\,  for  Y3. 

Now  we  have  the  needed  densities  to  allow  us  to  write  Equation  4.1  for  this 
application.  Recall  that  Equation  4.1  is  the  joint  distribution  of  the  manifest 


119 


variables,  prior  to  the  categorization  step.  We  have 

f{yi,  2/2,  ys)  = [ A12exp(-A12j/i)  [1  + {pxyx)Vl]~l 


+pxp1((3xy1)Pl  1 [1  + (/3xy1)Pl]  2 exp(-Ai2?/i) 
x(2'rr)-1/2(a2y2)~1  exp  (— [log(j/2)  - log(o;)]2/(2cr|)) 


exp(/?30  + /331x) 

V3 

1 

l + exp(/330  +/33ix)_ 

_1  + exp(/330  + /33ix)_ 

xabxb  1 exp(— axb)dx. 

We  categorize  Yi,  Y2,  and  x and  the  analog,  then,  to  the  left  hand  side  of  the 
previous  equation  is  7r?U2j3,  a multinomial  cell  probability,  and  the  analogue  to  the 
right  hand  side  is 


Ejx  P(Yu  e hn  I xt  e ljx,  e1)P{Y2i  g l2j2  | Xi  g ljx,  d2)x 
P(Y3i  e l3h  I Xi  g Ijai  0$)P(xi  g ljx  1 1), 


where 


P{Yn  G /ljj  | £*  € Ijxi  9^) 
P(Y2i  G I2j2  | Xi  G Ijx,  9_2) 
P(Y3i  = 1 | G /*,  0a) 

P(y3i  = 0 | Xi  G IJx,  &,) 
€ Jj.  | g.) 


= S(l2j2i ) — S(u2j2i ) 
exp(/330  + jg3igjj 
1 + exp(/330  + 

1 

1 + exp(/330  + /33ixJx) 

= exp(-o®5,)  x [«i.  - k.1 


One  adjustment  that  could  be  made  to  our  assumptions  is  that  Y2  given  x 
could  follow  a log-logistic  distribution,  instead  of  a log-normal  distribution.  In 
this  case  we  would  have  Y2  ~ Log  — logistic( A2i,  p2)-  The  advantage  to  this 
assumption  is  that  the  CDF  of  the  log-logistic  distribution  exists  in  a closed  form, 
and  therefore  its  survival  function  does  as  well,  whereas  that  of  the  log-normal 
does  not.  There  are  no  real  disadvantages  to  making  this  change  in  assumptions, 


120 


because  under  the  log-logistic  assumption,  log(F2)  follows  a logistic  distribution 
which  is  symmetric  with  only  slightly  heavier  tails  than  a normal  distribution. 
Therefore,  in  practice,  the  log-normal  and  log-logistic  distributional  assumptions  for 
Y2  given  x are  essentially  interchangeable. 

In  the  case  where  it  is  assumed  that  Y2  ~ Log  - logistic^ A2i,  p2),  where 
A21  = 1/x  and  E(\ogY2)  = logx.  Then,  for  the  survival  function,  we  would 
have  S(y2 ) = 1/  (1  + (A 2\y2)V2).  Furthermore,  for  Y\  we  have  that  S(yi)  = 
S{yn)S{yi2),  where  S(yn)  = 1/(1  + (A11y11)Pl)  and  S(y12)  = exp(-A  12y12).  So, 
after  categorization,  for  the  latent  class  model  Equation  4.3,  we  would  have  for  the 
(ji  — 1,  j2  = 1,  jz  — l)th  multinomial  cell  probability 

"in  = H 

jx 

x [i+((i /Xjx)i21r  i+((i/xjx)u21r\ 

x r exp (^3Q  + flziXjx) 

[l  + exp(#}0  + Pzixjx)_ 

xabx1’-1  exp(— axjj  x [ujx  - ljx) 


exp(-A  12/u)  exp(-A12«u) 

_i + (/3xjX(in))pi  i + (pXjx(un)r_ 

1 1 1 


where  jz  = 1 is  equivalent  to  YZi  = 1 and  0 = (/5,  p1?  Ai2,  p2,  /330,  fi31,  a,  b )'  is 
the  vector  of  model  parameters  which  entered  the  conditional  densities  through  the 
mappings  defined  in  Equation  3.4.  In  general,  the  probability  of  falling  into  cell 
jijzjz i where  jT  — 1,  2,  ...,  Cr,  r = 1,  2,  3,  of  the  multinomial  distribution  is 


{§)  ^ ^ 

Jx 


exp(-A  12llh) 

.l  + Gfrfctfyjr 


exp(-A12uljl) 

1 + (Pxjx(uijl))Pl . 


1 

1 

l + ((l/xJx)l2hr  ! + ((!/ 

Xu)U  2j2)P2. 

1 

1 2/3>3 

exp  (^30  + /331xjx) 

1 + exp(/330  + Pzixjx) 

l + exp(/330  + 03iXjx)_ 

x abxb-x  1 exp(-aar$J  x [ujx  - ljx\ . 


CHAPTER  6 

OTHER  POSSIBLE  APPLICATIONS 

6.1  Application  for  MCHERDC  with  Developmental  Delay  and  Disability 

Outcome 

The  Maternal,  Child  Health  and  Education  Research  and  Data  Center 
(henceforth,  MCHERDC),  in  the  College  of  Medicine  at  the  University  of  Florida 
has  access  to  many  state-wide  data  sets  to  perform  epidemiological  research  studies 
in  the  area  of  maternal  and  child  health.  In  a future  study,  we  will  apply  the 
methods  presented  in  this  dissertation  to  data  from  MCHERDC.  The  outcome 
variable,  Yi,  will  be  an  indicator  (Yes/No)  of  early  childhood  developmental 
delay  or  disability  (henceforth,  DDD).  The  mean  of  this  manifest  variable  will 
be  modeled  as  a function  of  the  true  gestational  age  (henceforth,  GA),  which  is 
denoted  as  x. 

The  outcome  of  interest  is  an  indicator  of  DDD,  so  per  each  individual,  we 
know  that  Yi  follows  a Bernoulli  distribution  where  we  are  interested  in  modeling 
the  mean,  i.e.,  the  proportion  of  "successes’  or  DDD’s.  We  assume  a logistic  mean 
function.  Therefore,  the  mapping  for  this  variable  defined  in  Equation  3.1  that  we 
are  interested  in  modeling  is  7ll  = hn(x-,  £)  = So  £ = (/301,  Ai)', 

where  (3oi  and  /?n  are  the  intercept  and  slope,  respectively,  of  the  linear  predictor 
in  a logistic  regression  model.  The  mapping  related  to  7n  is  written  in  terms  of 
the  conditional  distribution  parameters,  that  is  the  mapping  defined  in  Equation 
3.2,  is  7n  = P(Yi  = 1 | x)  = hn(x;  0^),  where  Yi  = 1 if  a DDD  is  observed,  zero 
otherwise.  Now,  because  0X  = E(YX  \ x),  where  £x  is  P(Y1  = 1 | ar),  the  density  in 
terms  of  the  conditional  distribution  parameter  is  then  f(yx  \ x,  p)  = pyi(l  - p)1_yi, 
when  Yx  is  Bernoulli.  Since  this  variable  is  already  dichotomous,  there  is  no  need 


121 


122 


for  further  categorization  during  the  estimation  procedure  for  the  GSME  model. 
Assuming  x has  been  categorized,  the  probabilities  used  in  developing  the  right 
hand  side  of  the  multinomial  relation  9 = 7 r(/3)  + e , i.e.,  Equation4.4,  for  this 
variable  would  simply  be 

P(Y,  = 1 1 * e 4,  &) 

P(Yi  = 0 1 1 6 4,  &) 

The  model  parameters  are  not  identified,  of  course,  without  additional  informa- 
tion. We  now  discuss  its  estimation  using  an  instrumental  variable. 

There  are  many  different  ways  to  estimate  true  GA,  but  the  one  assumed 
to  be  most  reliable  is  the  estimate  that  is  based  on  the  mother’s  last  menstrual 
period,  or  LMP  [39],  i.e.,  GA=DOB-(LMP+14).  This  estimate  based  on  LMP 
will  serve  as  Y2,  the  measured  value  of  x,  true  GA,  in  this  application.  It  is 
reasonable  to  assume  that  Y2  = x + u,  where  E(u)  = 0,  and  that  Y2  is  logistically 
distributed  (a  symmetric  distribution  with  only  slightly  heavier  tails  than  the 
normal  distribution). 

There  are  a number  of  possibilities  to  consider  for  F3,  the  instrumental 
variable.  One  very  easily  accessible  variable  is  birth  weight  (BW).  Whether 
BW  meets  the  requirements  of  an  IV,  however,  is  debatable.  While  there  is  no 
question  that  BW  is  independent  of  the  error  in  Y2,  i.e.,  the  measurement  error, 
the  independence  between  BW  and  the  model  error  in  Y\  is  questionable.  One 
school  of  thought  is  that  given  true  GA,  BW  may  not  have  an  effect  on  DDD, 
i.e.,  at  a fixed  value  for  true  GA,  there  may  not  be  a correlation  between  BW  and 
DDD.  If  this  assumption  were  true,  BW  would  satisfy  the  criteria  for  an  IV  and  its 
accessibility  in  current  data  sets  would  make  it  a prime  candidate  as  the  IV  in  this 
application.  A second  variable  that  is  easily  accessible  and  might  serve  as  an  IV  is 
length  of  stay  (LOS),  which  is  the  amount  of  time  an  infant  remains  in  the  hospital 


exp(A)i  + 0nxjx) 
l + exp(A>i  +/3nxjx) 

1 

1 + exp (/301  + Pnxjx) ' 


123 


after  birth.  This  variable  is  clearly  correlated  with  true  GA,  as  a very  premature 
infant  would  most  certainly  have  a long  LOS  after  birth.  Further,  LOS  is  not 
used  in  estimating  GA,  thus  LOS  is  clearly  uncorrelated  with  the  measurement 
error  in  the  estimated  value  of  GA.  The  assumption  would  have  to  be  made  that 
LOS  is  independent  of  the  model  error  in  the  outcome,  DDD.  This  is  the  variable 
considered  for  the  IV  at  the  end  of  this  section,  where  some  details  from  Chapters  3 
and  4 are  given  in  the  context  of  this  example. 

Other  possible  choices  for  y3  come  from  other  independent  estimates  of  GA. 
There  are  at  least  three  alternative  methods  for  estimating  GA.  All  of  these 
estimates  come  from  neonatal  (immediately  after  birth)  methods  and  therefore 
would  be  independent  of  the  measurement  error  in  estimating  true  GA  through 
LMP.  One  problem  is  that  these  variables  are  not  immediately  accessible  in  the 
most  commonly  used  data  sets  by  MCHERDC,  thus  some  merging  of  data  sets 
would  be  required  prior  to  analysis.  The  three  other  methods  of  estimation  for  GA 
are  the  Ballard  score,  which  is  from  measurements  on  the  fetus  taken  immediately 
after  birth  (see  [39,  53]),  an  estimate  based  on  the  bipariental  diameter,  i.e.,  a 
head  measurement  [37],  or  an  estimate  based  on  sonographic  measurement  of 
femur  length  [39].  Further  research  or  discussions  with  practitioners,  as  well  as 
determining  accessibility  of  these  variables,  would  be  required  to  determine  which 
of  these  would  serve  best  as  the  IV,  V3. 

Once  these  initial  issues  are  resolved,  the  first  step  in  estimating  the  model 
parameters  will  be  to  categorize  all  continuous  variables  and  develop  an  analogous 
equation  to  Equation  4.3,  the  extended  Latent  Class  Model  analog  to  the  joint 
distribution  of  the  manifest  variables,  but  in  terms  of  the  variables  specific  to 
this  application.  As  was  seen  above,  Y\  is  already  a categorical  manifest  variable, 
and  we  know  that  T2,  I3,  and  x would  follow  continuous  distributions  that  would 
require  the  categorization  described  in  Chapter  4.  The  joint  marginal  distribution 


124 

of  Y\,  Y2,  and  T3,  after  categorization,  is,  again,  a multinomial  distribution.  The 
starting  point  for  the  probability  of  falling  into  cell  j\j2jz  of  the  cross  classification 
of  Y\  and  the  remaining  categorized  variables  is  written  as 


T71J2J3  — 


exp(/3io+£nz) 

ji 

1 

[l+exp(/3io+/8iia:) 

_l+exp(/8io+/8n®)_ 

x Shn  My 2 I = I,.,  A.iAft /fljj  / 3 (s/3  k = i,. ■ I M)L„\ , 


(6.1) 


where  ji  is  zero  or  one,  for  no  DDD  or  observing  a DDD,  respectively. 

Let  us  now  consider  the  remaining  variables.  Assume  that  true  GA,  x, 
follows  a Weibull  distribution,  Y2,  the  estimated  value  of  GA,  follows  a Logistic 
distribution  (as  mentioned  above),  and  that  for  V3,  the  IV,  we  consider  LOS  and 
assume  it  follows  a Log-Logistic  distribution.  These  distribution  assumptions  are 
similar  to  those  made  by  Resnick  et  al.  [47].  As  mentioned  above,  LOS  is  clearly 
related  to  true  GA  and  is  not  used  in  estimating  GA,  therefore  is  it  independent  of 
the  measurement  error.  We  must  heuristically  assume  LOS  is  uncorrelated  with  the 
model  error  in  Y\. 

There  are  no  mappings  required  for  x in  the  GSME  model,  therefore,  assuming 
x ~ Wieb(a,  b),  we  have  P(x  £ Ijx  \ 6^)  = abx*~l  exp(— axbjx)  [Ljx]  for  the 
jxth  category  of  x.  This  is  the  required  piece  for  f(xjx  \ 0_x)Ljx  in  Equation  6.1. 
Assuming  Y2  ~ logistic( A2,  p2),  and  is  an  unbiased  measurement  of  GA,  the 
mappings  defined  in  Chapter  3 for  this  variable  would  be 


721 

h2i(x;  /521) 

X 

722 

/?22 ) 

022 

721 

T2 1C&) 

a2 

722 

T22(e2) 

pW/ 3 

125 


A closed  for  CDF  exists  for  this  distribution  so  for  /2(y2  | x,  — %jx,  fi2) dy2  in 
Equation  6.1  we  have  that 


P{Y2  € I2j2  | x € Ijx,  02) 


1 + exp  (— 7T (u2h  - xjx)  /y/3fa 


1 + exp  (-7T (l2j2  - xjx)  / \[Z^22 


-l 


Using  LOS  as  the  IV  providing  identifying  information  to  this  GSME  model,  would 
require  modeling  a segmented  linear  predictor  for  731.  The  reason  for  this  is  that 
knowing  the  ideal  GA  is  36  weeks,  the  slope  of  the  line  below  36  weeks  must  be 
negative  since  infants  with  a GA  far  below  36  weeks  would  require  a longer  LOS, 
where  LOS  would  decrease  the  closer  you  came  to  36  from  the  left.  Further,  as 
GA  increases,  infants  tend  to  be  more  sick,  requiring  a longer  LOS  for  GA’s  much 
greater  than  36  weeks,  thus  requiring  a positive  slope  for  values  above  36  weeks. 
The  segmented  linear  predictor  with  a inflection  point  at  36  weeks  would  be  given 
as  773  = /?30  + (x  — 36)/931/U<36)  + (x  — 36)/3 32/^-36\  where  /(I<36)  is  an  indicator 


equaling  one  if  x < 36  weeks,  and  is  zero  otherwise.  The  second  indicator  is  defined 
in  a similar  manner.  So  assuming  Yz  ~ log  — logistic(\z,  p3),  the  mappings  required 
for  Yz  would  be 


731 

h31(x;  AJ 

V3 

732 

h32  (x;  ^ 2) 

033 

731 

231(03) 

1/ A3 

732 

T32(iz) 

1/P3 

So  for  the  remaining  piece  in  Equation  6.1,  Jz  fz(y3  \ x = xjx,  ^)dy3,  we  would 
have 


-P(U3  G Izj3  | x 6 Ijx ; 6) 


-l 


1 + 


126 


For  this  example,  under  the  current  assumptions,  we  have  as  our  model  parameters, 
P_  = (Pw,  Pn,  P22,  P20,  P31,  P 32 , P33,  a,  by.  At  this  point,  intervals  may  be  chosen 
for  the  continuous  variables,  and  once  derivatives  with  respect  to  the  elements 
of  P_  are  taken  for  use  in  the  EGNLS  procedure,  as  discussed  in  Chapter  4,  the 
example  is  ready  to  be  applied  to  data,  which  is  available  from  MCHERDC  at  the 
University  of  Florida.  The  actual  application  to  data  will  be  done  at  a future  time. 

6.2  Application  to  Alzheimer’s  Disease 

One  of  the  examples  discussed  in  the  introductory  chapter  to  this  dissertation 
is  the  measurement  error  problem  which  occurs  when  one  might  possibly  attempt 
to  determine  the  association  between  the  likelihood  of  developing  Alzheimer’s 
disease  (AD)  and  the  level  of  aluminum  deposits  which  have  accumulated  in  the 
brain  over  time  (see,  for  example,  [12]).  Recall  that  a measurement  error  is  created 
due  to  the  fact  that  determining  a perfectly  accurate  level  of  aluminum  deposits  in 
the  brain  is  not  possible.  Assuming  one  has,  for  example,  some  clinical  estimate 
of  the  amount  of  aluminum  in  the  brain  then  this  problem  could  be  handled  by  a 
GSME  model,  in  the  presence  of  an  IV. 

The  outcome  variable  of  interest,  Yr>  = Fj,  say,  would  be  dichotomous  and, 
obviously,  take  on  the  two  possible  values  of  AD  or  no  AD.  This  outcome  could 
be  modeled  using  logistic  regression  under  a generalized  linear  model.  The  clinical 
estimate  of  the  amount  of  aluminum  deposits  in  the  brain  would  be  represented 
by  Y2  in  the  GSME  model,  as  the  observed  value  which  is  contaminated  with 
measurement  error.  The  unattainable,  true  value  for  the  amount  of  aluminum 
deposits  would  of  course  be  x. 

A major  difficulty  posed  in  this  application  is  finding  a variable  which  satisfies 
the  requirements  of  an  IV.  This  would  be  the  third  manifest  variable,  V3,  in  the 
GSME  model  which  is  the  source  of  identifying  information.  Because  it  is  not 
clearly  known  how  aluminum  enters  the  body  and  accumulates  over  time,  finding 


127 


a variable  related  to  the  level  of  aluminum  in  the  brain,  or  body  in  general,  but 
independent  of  both  the  measurement  error  in  the  aluminum  level  estimate  and  of 
the  model  error  in  Y\  (i.e.,  conditionally  independent  of  Yi)  proves  to  be  a difficult 
task.  Following  are  a few  ideas,  but  they  may  rather  serve  as  possible  starting 
places  for  such  future  research,  as  few  provide  a plausible  solution. 

The  immediate  choice  for  Y3  would  be  a second,  independent  method  of 
estimating  aluminum  deposits  in  the  brain.  As  most  estimates  of  aluminum 
deposits  come  post-mortem  [3],  it  may  be  difficult  in  itself  finding  one  method 
reliable  enough  to  use  as  the  measured  value.  A second  possibility  for  F3  is  the 
indicator  of  another  disease  known  to  be  related  to  aluminum  in  the  brain.  One 
such  example  is  a disease  known  as  encephalopathy  [3],  which  is  a neurological 
disorder  caused  by  the  deterioration  of  brain  functions.  Although  this  deterioration 
of  brain  function  may  not  directly  be  a cause  of  AD,  it  is  possible  that  such  a 
disease  has  some  level  of  correlation  with  AD.  There  would  be  a need  for  clear, 
quantifiable  indicators  of  this  neurological  illness,  separate  from  indications  of  AD. 
If  encephalopathy  is  an  illness  not  related  to  Alzheimer’s,  it  could  serve  as  >3,  as 
it  would  further  be  related  to  the  true  x.  but  also  independent  of  the  measurement 
error  of  x. 

The  following  two  variables  could  serve  as  IV’s,  if  they  proved  to  provide 
attainable  values  in  humans.  The  reason  arriving  at  such  values  is  in  question, 
is  that  the  following  studies  have  been  performed  on  laboratory  animals  only. 

They  are  worth  mentioning  to  serve  as  starting  points  for  relevant  future  research. 
Bushinsky  et  al.  [8]  found  a relation  between  the  effects  of  aluminum  in  the  body 
on  bone  surface  ion  composition  in  laboratory  mice.  Ion  considerations  on  the  bone 
surface  and  eroded  subsurface  were  23Na,  27A1,  39K,  and  40Ca.  The  content  of 
bone  surface  would  be  a variable  related  to  the  true  amount  of  aluminum  in  the 
body,  but  presumably  not  related  to  the  model  error  in  the  likelihood  of  developing 


128 


AD  nor  the  error  in  Y2.  Yokel  and  O’Callaghan  [67]  found  that  an  increase  in 
acidic  protein  in  glial  fibers  in  laboratory  rabbits  was  related  to  aluminum  levels 
injected  over  time.  Glial  fibers  are  contained  within  the  soft  tissue  which  surrounds 
the  central  nervous  system.  Again,  this  offers  a plausible  starting  place  for  future 
research  as  levels  of  such  acidic  protein  are  related  to  amounts  of  aluminum 
deposits,  but  may  be  uncorrelated  with  the  likelihood  of  developing  AD  and  also 
uncorrelated  with  the  measurement  error  in  x. 

A possible  source  of  aluminum  in  the  body  may  be  due  to  environmental 
factors,  such  as  through  drinking  water.  A purely  speculative  variable  for  Y3  could 
be  the  location  of  an  individual.  If  it  is  determined  that  different  living  locations, 
perhaps  between  countries,  are  correlated  with  different  levels  of  aluminum  deposits 
in  the  brain,  then  this  variable  may  serve  as  a possible  IV.  The  location  of  living  is 
most  likely  not  correlated  with  AD.  This  variable  offers  possibly  the  best  choice  for 
Y3,  however,  no  research  has  been  found  stating  such  a relation. 

In  medical  research,  controlling  for  error  free  covariates  is  often  important. 
These  variables  related  to  the  outcome  of  interest,  have  been  mentioned  throughout 
this  dissertation  and  denoted  by  Z.  Including  error  free  covariates  in  the  GSME 
model  is  discussed  as  possible  future  work  in  Chapter  7.  In  this  application,  there 
are  obvious  Z covariates  which  are  at  least  worth  mentioning  here.  These  are 
variables  which  are  not  contaminated  by  measurement  error,  but  are  related  to 
the  likelihood  of  developing  AD.  One  such  covariate  is  an  individual’s  sex.  It  has 
been  found  that  women  are  three  times  as  likely  to  develop  AD  than  are  men  [27]. 
Possible  evidence  of  the  use  of  age  as  a Z covariate  is  that,  at  least  in  women,  it 
has  been  seen  that  an  increased  age  at  which  menopause  begins  decreases  the  risk 
of  developing  Alzheimer’s  [25].  Another  possible  error  free  covariate  that  would, 
ideally,  need  to  be  controlled  for  is  race.  It  is  known  that  there  is  a decreased  risk 
of  Alzheimer’s  disease  associated  with  having  a race  of  white  [59]. 


129 

This  application  of  the  GSME  model  proves  to  be  difficult,  not  only  due  to 
the  lack  of  a viable  IV,  but  due  also  to  the  problems  in  attempting  to  estimate 
aluminum  levels  in  the  first  place.  Alfrey  et  al.  [3]  discuss  in  some  detail  the 
difficulty  of  estimating  levels  of  aluminum  in  the  brain,  as  well  as  in  muscle  tissue 
and  bone.  They  admit  that  “a  major  problem  in  determining  tissue  [including 
brain]  aluminum  content  has  been  the  lack  of  a sensitive  method  for  aluminum 
analysis.”  [3,  p.  184]  They  used  a flameless  atomic  absorption  technique  which 
had  a detection  limit  of  6 ppb  for  aluminum  which  is  performed  at  autopsy.  It  is 
unknown  to  the  author  of  this  dissertation  what  improvements  have  been  made  in 
determining  aluminum  levels  or  if  newer,  entirely  different  methods  for  estimating 
aluminum  levels  in  live  patients  exists.  One  can  clearly  see  how  this  important,  real 
world  application  requires  the  need  of  measurement  error  techniques,  which  could 
be  handled  through  the  use  of  a GSME  model,  to  acquire  a model  for  estimation  or 
prediction  purposes  who’s  parameter  estimates  are  not  biased,  or  at  least  where  the 
bias  is  reduced. 


CHAPTER  7 

DISCUSSION  AND  FUTURE  WORK 

In  closing  this  dissertation,  we  end  with  a chapter  briefly  summarizing  and 
discussing  some  of  the  more  crucial  aspects  of  the  GSME  model  defined  in  Chapter 
3,  the  methodology  established  in  Chapter  4 to  estimate  the  model  parameters 
for  the  model,  and  the  model’s  numerous  and  varied  applications.  We  also  briefly 
discuss  plans  for  work  to  be  completed  in  the  future  on  this  topic  as  well  as  a 
potential  alternative  solution  for  the  GSME  model  which  may  be  worth  pursuing. 
Some  of  these  items  have  only  briefly  been  discussed  as  future  work,  others  were 
worked  out  in  detail  prior  to  abandoning  them  until  a future  time.  For  the  items 
where  such  details  have  been  worked  out,  an  attempt  was  made  to  include  as  many 
of  those  details  as  possible  here. 

7.1  Discussion  of  the  GSME  Model 

The  GSME  model  utilizes  a realistic  means  of  model  identification  and  is 
generally  applicable.  We  have  seen  how  many  well  studied  ME  models  fit  into  the 
GSME  model  framework  and  its  minimum  of  assumptions  makes  it  an  accessible 
and  practical  method  for  real  world  applications. 

The  possible  impacts  of  this  newly  defined,  measurement  error  model  are 
unknown  until  more  research  is  done  in  this  area.  GSME  models  encompass  linear 
and  non-linear  models  alike,  as  well  as  have  the  ability  to  handle  multinomial- 
categorical,  discrete,  and  continuous  variables.  This  ME  model  may  be  applied  to 
GLM’s,  but  it  also  improves  upon  them  in  the  sense  that  the  GSME  model  is  not 
restricted  to  have  a linear  predictor. 

The  theorems  which  provide  conditions  under  which  the  GSME  model 
is  identified  are  significant  contributions  to  the  literature  on  ME  models.  For 


130 


131 


models  which  tend  to  be  more  complex  than  a simple  linear  model  with  normality 
assumptions,  there  are  no  theorems  which  provide  generally  applicable  rules  for 
model  identification.  The  theorems  we  provide  in  Section  3.3  of  Chapter  3 may 
not  be  generally  applicable  in  all  situations,  due  to  the  fact  that  often  the  joint 
marginal  distribution  of  the  manifest  variables  is  unknown,  but  at  a minimum, 
they  provide  a basis  for  thought.  In  an  early  paper,  Reiersol  [46]  provides  theorems 
for  identifiability  in  linear  ME  models.  We  now  summarize  one  of  his  theorems. 
Assuming  the  model  error  and  measurement  error  are  distributed  as  a bivariate 
normal  random  vector  in  a linear  model,  the  intercept,  /3i,  is  identified  if  and  only 
if  at  least  one  of  either  x or  y , in  the  equation  written  without  model  error,  i.e., 

U — P o + xPi,  is  not  normally  distributed  [46,  p.  380].  It  was  seen  in  Section  2.4 
of  Chapter  2,  that  the  question  of  identifiability  has  been  addressed  only  in  certain 
situations,  but  to  the  authors  knowledge,  at  the  time  this  dissertation  was  written, 
no  general  theorems  exist  for  large  classes  of  models  which  are  more  complex  than 
a linear  model. 

One  of  the  key  assumptions  used  in  the  established  methodology  is  that  of 
conditional  (given  x ) independence  of  the  manifest  variables.  In  practice,  this 
assumption  may  often  be  questionable,  or  it  simply  may  be  desired  to  use  variables 
which  are  known  to  be  conditionally  dependent.  In  the  numerous  applications 
of  the  GSME  model,  lies  a solution  to  this  problem  of  conditionally  dependent 
manifest  variables.  The  solution  has  been  alluded  to  throughout  Chapters  3 and 
4.  It  was  shown  in  Section  3.2  of  Chapter  3 how  a multivariate  regression  model 
fits  into  the  framework  of,  and  therefore  its  parameters  may  be  estimated  using,  a 
GSME  model.  By  grouping  conditionally  dependent  variables  together  to  form  a 
multivariate  manifest  vector  which  is  conditionally  independent  of  the  remaining 
manifest  variables,  we  have  a solution  to  the  case  where  the  crucial  assumption  of 
conditional  independence  is  violated  between  certain  variables. 


132 


Although  many  of  the  applications  exemplified  in  this  dissertation  are  in  the 
medical  field,  as  mentioned  in  Chapter  1,  measurement  error  models,  and  thus  the 
GSME  model,  can  be  applied  to  a number  of  areas  outside  of  the  medical  field  as 
well.  We  saw  in  Section  3.2  of  Chapter  3 how  many  well  studied  models  fit  into  the 
GSME  model  set-up.  These  models,  obviously,  can  be  found  in  fields  other  than 
the  medical/health  areas.  Such  models  can  be  used  to  describe  situations  in  the 
social  sciences,  business  applications,  agricultural  applications,  as  well  as  many 
others.  Thus,  the  GSME  model  provides  a solution  to  numerous  situations  where 
there  is  an  independent  variable  that  is  not  observed  directly,  but  rather,  we  only 
have  access  to  a realization  of  that  variable  which  is  measured  with  error. 

7.2  Monte  Carlo  Study  and  Asymptotic  Efficiency 

An  important  area  of  future  work  is  to  perform  a Monte  Carlo  study  to 
determine  the  loss  of  efficiency  in  the  estimator  derived  for  the  GSME  model. 

There  is  some  initial  concern  that  our  estimator  may  lose  efficiency  due  to  the  fact 
that  information  is  lost  in  the  categorization  step  in  defining  the  model  as  a Latent 
Class  model  to  apply  to  EGNLS.  We  have  two  potential  ways  for  conducting  a 
Monte  Carlo  study.  One  is  to  assume  that  all  variables  are  normally  distributed, 
since  we  can  solve  the  problem  in  the  normal  case  using  methods  described  in 
Chapter  2,  and  compare  the  MSE  from  this  known  solution  to  the  MSE  from  our 
method,  solving  the  all  normal  case  through  the  methodology  established  for  the 
GSME  model  in  Chapters  3 and  4.  The  second  possible  idea  is  to  start  with  coarse 
categorizations  of  the  continuous  and  discrete  variables  and  compare  the  MSE;s  as 
the  categorizations  become  finer  and  finer,  approaching  continuous  distributions. 

Knowing  that  efficiency  is  lost  due  to  categorization  begs  the  question, 
if  efficiency  can  in  anyway  be  regained.  We  believe  that  an  alternate  way  of 
categorizing  the  non-categorical  manifest  variables  may  result  in  an  estimator 
which  is  asymptotically  efficient,  or  at  least,  more  efficient  than  the  estimator 


133 


under  the  categorization  discussed  in  Chapter  4.  One  possibility  may  be  to  have 
the  newly  defined  categories  be  such  that  as  the  sample  size  increases,  the  length 
of  the  intervals  defining  the  categories  decreases  slowly  enough  as  to  not  lead 
to  inconsistent  cell  proportions,  7r,  but  still  gain  asymptotic  efficiency.  In  this 
scenario,  as  n increases,  the  categorization  becomes  finer  and  finer,  and  thus  the 
newly  defined  categorical  distribution  theoretically  approaches  that  of  a continuous 
distribution.  We  know  from  Shah  [52]  that  in  the  diagnostic  testing  situation, 
when  the  binary  variables  are  left  unchanged,  the  resulting  estimators  from  the 
EGNLS  method  are  asymptotically  efficient.  For  the  GSME  model  under  this 
hypothetical  new  categorization,  knowing  that  the  initial  categorical  variables 
remain  unchanged,  and  if  the  non-categorical  variables  are  categorized  in  such  a 
way  as  to  approach  a continuous  distribution  with  an  increase  in  sample  size,  the 
estimators  derived  from  the  EGNLS  procedure  may  be  more  efficient  than  the 
estimators  derived  in  Chapter  4,  or  perhaps,  even  asymptotically  fully  efficient. 

7.3  The  GSME  Model  with  Error-Free  Covariates 
It  was  seen  in  Chapter  2,  that  many  of  the  existing  solutions  to  measurement 
error  problems  have  the  capacity  to  include  covariates  which  are  free  of  measure- 
ment error.  Recall  that  these  covariates  have  previously  been  denoted  by  Z.  As 
was  previously  mentioned,  this  would  be  an  important  extension  of  the  GSME 
model,  as  there  are  many  applications  where  controlling  for  such  variables  would 
be  required.  For  example,  in  the  RERF  application,  there  is  information  contained 
in  the  error  free  covariates  age  at  time  of  bomb,  sex,  and  city.  These  are  variables 
which  are  not  related  to  the  amount  of  radiation  exposure,  but  they  are  related 
to  the  outcome  of  interest,  time  until  the  diagnosis  of  leukemia.  A valid  analysis 
should  control  for  these  variables  when  estimating  the  effect  of  radiation  dose.  The 
potential  problem  with  this  extension  is  that  in  applications,  a sample  large  enough 


134 


to  allow  estimation  of  the  7r  parameters  using  the  method  described  in  Chapter  4 
may  not  be  available. 

The  basic  idea  behind  including  error  free  covariates  to  the  GSME  model 
is  not  complex.  To  allow  full  generality  continuous,  discrete,  and  multinomial- 
categorical  error  free  variables  would  be  allowed.  The  non-categorical  Z_  covariates 
would  be  categorized  in  a similar  manner  to  that  described  for  the  manifest 
variables  in  Chapter  4.  In  the  presence  of  these  variables,  we  would  assume  that  all 
conditional,  given  x and  Z_ , distributions  of  the  manifest  variables  would  be  known. 
Further,  the  distribution  of  x given  Z_  would  also  be  known. 

The  joint  marginal  distribution  of  the  manifest  variables  would  initially  be 
written  in  a similar  form  to  Equation  4.1,  except  the  right  hand  side  would  be 
multiplied  by  the  distributions  of  the  Z_  variates  and  also  would  be  integrated  over 
the  ranges  of  all  continuous  error  free  covariates  and  summed  over  the  discrete 
values  or  categories  of  all  discrete  and  multinomial  categorical  error  free  variables, 
respectively.  The  same  complications  arise  when  considering  the  analog  equation  to 
Equation  4.1  in  the  presence  of  Z variables,  in  that  the  joint  marginal  distribution 
all  of  the  y ’s  would  in  most  cases  still  be  unknown  and  the  integration  required 
on  the  right  hand  side  of  the  analog  equation  would  still  most  likely  not  exist  in  a 
closed  form.  There  is  a change  in  the  parameter  vector  of  interest  in  that  it  would 
also  now  contain  the  variables  which  describe  the  relation  between  the  F’s  and  the 
Z’s,  x and  the  Z’s,  and  the  distributional  parameters  of  the  marginal  distributions 
of  the  error  free  covariates. 

Since  the  theorems  on  identifiability,  presented  in  Section  3.3  of  Chapter  3,  are 
applied  to  the  joint  distribution  of  the  manifest  variables,  these  results  would  still 
hold  in  the  presence  of  error  free  covariates,  because  such  variables  are  not  present 
in  the  joint  distribution  of  all  the  E’s.  One  difference,  as  mentioned  above,  is 
that  the  vector  of  model  parameters,  /?,  would  be  of  greater  dimension.  This  may, 


135 


perhaps,  make  it  more  difficult  for  a model  in  the  presence  of  error  free  covariates 
to  be  identified. 

For  each  possible  cross-classification  of  the  categories  of  the  categorized  Z_ 
covariates,  we  would  obtain  the  multinomial  relation  = 7r(/3)  + e.  Therefore,  for 
fixed  levels  of  the  error  free  covariates,  we  could  estimate  the  model  parameters. 
The  possible  set  back  is  that  there  likely  would  not  be  available  enough  observa- 
tions to  fill  all  possible  cells  in  the  multinomial  relation  7?  = -I-  e for  every 

possible  combination  of  the  categorized  error  free  covariates.  All  combinations  of 
cross-classifications  of  the  Z covariates  together  form  this  nonlinear  system  of  equa- 
tions which  is  solved  by  the  EGNLS  procedure.  Assuming  a sample  size  exists  that 
is  large  enough  to  fill  all  possible  cells  in  the  entire  vector  of  multinomial  propor- 
tions, the  estimation  procedure  remains  the  same,  and  the  asymptotic  properties 
established  at  the  end  of  Chapter  4 would  still  hold  for  the  parameter  estimates 
in  the  presence  of  Z_  covariates.  The  asymptotic  theorems  would  generalize  to 
this  case  easily  since  those  theorems  are  based  on  the  vector  of  model  parameters, 
/?,  and  their  relation  to  the  nonlinear  system  of  multinomial  proportions.  Those 
relations  are  not  directly  effected  by  the  inclusion  of  error  free  covariates. 

Since  sample  sizes  in  applications  would  most  likely  not  be  large  enough  to 
allow  for  parameter  estimation,  future  work  in  this  area  would  most  likely  be 
simulation-based  or  only  applied  to  examples  which  require  a minimum  number 
of  categories  for  the  variables.  This  second  possible  area  for  researching  the 
inclusion  of  error  free  covariates  would  most  likely  best  be  applied  to  variables 
which  are  already  categorical,  having  few  categories.  Since  categorization  would 
not  be  required,  the  large  amount  of  information  that  would  be  lost  due  to  the 
categorization  of  variables  into  a small  number  of  categories,  would  be  retained. 


136 


7.4  An  Alternate  Method 

Possible  future  work  could  also  be  the  further  development  of  a second,  general 
IV  estimation  method  for  GSME  models.  This  potential  second  method  differs 
from  the  one  presented  in  this  dissertation  because  it  begins  with  an  approximation 
that  may  lead  to  a slight  bias  in  parameter  estimation,  however,  there  is  no 
potential  for  a loss  of  efficiency.  We  had  begun  work  on  this  second  method  but 
realized  early  that  from  this  potential  method  alone,  one  could  not  arrive  at 
consistent  initial  estimates  for  use  in  the  EGNLS  procedure.  This  is  due  to  the 
approximation  that  is  made  in  the  initial  step  of  this  method.  It  was  determined 
that  consistent  estimates  would  need  to  be  obtained  from  another  method,  e.g.,  the 
method  described  at  the  end  of  Chapter  4,  and  used  as  the  initial  estimates  under 
this  alternate  method.  Since  this  method  could  not  be  used  independently  of  any 
other,  it  was  decided  that  the  method  described  in  this  dissertation  was  of  greater 
possible  value,  because  it  alone  could  be  used  to  consistently  estimate  the  model 
parameters.  The  idea  behind  the  alternative  method  and  the  details  which  had 
been  worked  out  earlier  are  given  below. 

This  method  will  be  able  to  handle  continuous,  discrete,  and  multinomial- 
categorical  manifest  variables,  just  as  the  method  presented  in  this  dissertation. 
Each  manifest  variable  is  assumed  to  have  a known  conditional  (given  x)  distribu- 
tion, generically  denoted  as  fr(y  \ x;  9r).  The  models  for  these  variables  would  be 
specified  through  the  same  mappings  as  those  given  in  Chapter  3.  Recall  that  these 
mappings,  for  each  manifest  variable  may  be  written  together  in  vector  form  as 
0r  — T”1  (jir(x-,  as  seen  in  Equation  3.4.  This  relation  allows  us  to  substitute 
x and  the  model  parameters,  ^ , for  each  r,  which  describe  the  relation  between  Y_T 
and  x,  in  place  of  the  conditional  distribution  parameters,  6r.  These  mappings  have 
the  same  properties  as  those  defined  in  Chapter  3. 


137 


The  distribution  of  x has  a range  of  possible  values  from  negative  infinity 
to  positive  infinity.  In  the  case  for  our  motivating  example,  where  x is  the  true, 
unknown  radiation  dose,  the  range  of  x is  from  zero  to  infinity.  The  first  step  is 
to  apply  the  definition  of  the  Riemann  integral  (see,  for  example,  [34,  p.  206])  to 
rewrite  the  right  hand  side  of  Equation  4.1,  which  describes  the  joint  distribution 
of  the  manifest  variables  prior  to  categorization.  As  in  the  method  presented 
above,  we  initially  state  the  joint  marginal  distribution  in  its  full  generality,  but  for 
notational  convenience,  we  show  the  details  assuming  univariate  outcomes  where 
all  variables  satisfy  the  conditional  independence  assumption.  Recall  that  the  joint 
distribution  is 


f{y 


1>  y2>  •••>  yp)  = [ Ufrilr  I &■)/(*  I Ddx, 

Jxr= 1 


where  fr{yr  I x,  0r)  may  be  the  conditional  distribution  function  for  a continuous, 
discrete,  or  multinomial  categorical  manifest  variable  or  any  combination  of  de- 
pendent variables  which  are  grouped  together  to  form  a conditionally  independent 
group.  First  we  must  write  that  integral  as  the  limit  of  an  integral  over  a bounded 
region  as  follows: 


/oo  p pb  P 

II  MtU  I X’  ~r)fiX  I Ddx  = ,lim  / II  /r(yr  I x,  lr )/0  I L)dx-  (7-1) 

00  r=l 

Here,  f(x  \ 0^)  represents  the  extended  version  of  the  density  defined  as  the  density 
function  of  x over  its  domain  and  zero  elsewhere.  Now  let  the  interval  [—6,  b]  be 
partitioned  into  a finite  number  of  subintervals  with  endpoints  defined  by  the  set  of 


points  P = {rco,  Xi,  ...,  xk}  such  that  -b  — x0  < x i < ...  < xk  — b.  Further,  let 
the  length  of  each  interval  be  so  that  the  set  P contains  the  k + 1 numbers 

Xj  — - b + 3 = I?  •••,  A.  Define  hj  — xj  - Xj- 1,  j — 1,  2,  ...,  k,  and  rrij 

as  the  midpoint  in  the  subinterval  [xj_1,  Xj],  j = 1,  2,  ...,  k.  We  define  different 
notation  and  a different  procedure  for  the  categorization  of  x from  that  described 


138 


in  Chapter  4 because  this  potential  solution  utilizes  the  definition  of  the  Riemann 
integral  and  this  new  notation  is  needed  to  do  so,  since  the  length  of  the  intervals 
here  must  approach  zero  as  n — »■  oo.  Note,  also,  that  the  intervals  for  x in  this  case 
are  required  to  be  of  equal  length,  whereas  the  categorization  previously  described 
for  x did  not  require  intervals  of  equal  length.  The  definition  of  the  Riemann 
integral  applied  to  the  right  hand  side  of  Equation  7.1  gives 

lim^oc  fib  IIr=i  fr(yr  I X,  er)f(x  | 0J dx  = 

lim^oo  limfe^oo  Ylj=i  FEU  fr(yr  I x = rrij,  Or)f{mj  \ 6^)hj.  (7.2) 

We  will  return  to  Equation  7.2,  but  first  we  shall  manipulate  Equation  4.1,  the 
joint  marginal  distribution  of  the  manifest  variables,  making  use  of  the  multinomial 
distribution  through  categorized  versions  of  the  continuous  and  discrete  manifest 
variables,  and  those  variables  which  were  already  categorical. 

For  notational  convenience,  we  shall  assume  all  manifest  variables  are  uni- 
variate and  conditionally  independent  given  x.  Without  loss  of  generality,  order 
the  manifest  variables  such  that  Yr,  r = 1,  2,  ...,  p',  are  the  continuous  vari- 
ables, Yr,  r = p'  + 1,  p'  + 2,  p",  are  the  “other”  discrete  variables,  and  Yr, 

r = p"  + 1,  p"  + 2,  ...,  p,  are  the  categorical  variables  which  initially  follow  a multi- 
nomial distribution.  First,  categorize  each  continuous  and  “other”  discrete  manifest 
variable.  Recall  that  the  “other”  discrete  variables  are  those  discrete  variables 
which  do  require  further  categorization,  i.e. , do  not  initially  follow  a multinomial 
distribution.  Categorize  each  Yr , r = 1,2,  ...,  p" , into  CT  categories,  in  a simi- 
lar manner  as  described  in  Chapter  4 for  the  GSME  model.  This  categorization 
results  in  univariate  intervals,  ITjr,  having  endpoints  (lTjr,  urjr),  r = 1,  2,  p" , 
jr  = 1,  2,  ...,  Cr,  however,  now  denote  length(/rjr)  = hrjr.  This  different  notation 
is  due  to  the  fact  that  the  lengths  of  the  intervals  for  this  method  have  slightly 
different  properties  that  those  for  the  method  given  in  this  dissertation.  Previously, 


139 


the  interval  lengths  were  allowed  to  remained  fixed  as  n — » oo,  in  this  case,  the 
intervals  must  shrink  to  a length  of  zero  as  n — » oo.  So  here,  Cr  will  depend  on 
n.  The  reason  for  this  will  become  apparent  in  details  which  follow.  As  before, 
the  categorical  manifest  variables  that  follow  a multinomial  distribution,  i.e.,  Yr, 
r = p"  + 1,  p"  + 2,  p,  will  remain  unchanged  and  the  number  of  their  categories 
will  again  be  denoted  by  Cr,  r = p"  + 1,  p"  + 2,  p,  where  the  categories  are 
indexed  by  jr,  jr  = 1,  2,  CT. 

Further,  as  in  Chapter  4,  let 


TuA-jp  P(y i G I\jl , Ypn  G Ip"jp„,  Yp"+ 1 — jp"+i,  ■■■,  Yp  — jp ). 


Define  n to  be  the  vector  of  N — niLi  Cr  distinct  categories  in  the  cross- 
classification of  the  categorized  versions  of  continuous  and  discrete  Yr,  r = 

1,  2,  p",  and  the  categorical  Yr,  r = p"  + 1,  p"  + 2,  p.  That  is,  7r  is  the 

vector  of  distinct  multinomial  probabilities  of  falling  into  the  categories  defined  by 
the  cross-classifications’  cells.  Assume  hTjr  = o(l/ns)  and  that  the  first  endpoint 
and  last  endpoint  for  each  Yr,  r = 1,  2,  ...,  p",  are  such  that  Zrl  = o(-n5)  and 
ur cr  = o{ns),  respectively,  where  S > 0 is  sufficiently  small  so  that  nJU-2...;p  — >•  oo  as 
n — » cxd,  for  all  j2,  ...,  jp.  Then,  it  can  be  shown  that 


f (yimji ) yimjii  yp"mj  hi  yp"+ ii  yp)  — lini 


lim 


7 r 


maxn  (ftOi  )^°  maxjp„  (V'v-  IlCi  5 

where  yTmjri  t = 1,  2,  ...p",  is  chosen  to  be  the  midpoint  of  the  jpth  interval  of  the 
continuous  and  discrete  Tr,  r = 1,  2,  ...,  p",  and  7r,  again,  is  the  vector  of  distinct 
probabilities  of  the  multinomial  distribution  defined  by  the  cross-classification  of 
the  categorized  continuous  and  discrete  manifest  variables  and  those  which  were 
already  categorical.  Therefore,  let 


y_~  {yirrij1  1 y2mJ2  1 ■■■■>  yp"m.jpll  ) yp"+l,  •••>  Pp) 


140 


and  we  define  f(y)  = ^(1/Ilr=i  ^r>r)>  where  the  sample  proportions,  7?,  are 
unbiased  and  consistent  estimates  of  the  cell  probabilities,  n.  Add  and  subtract 
f(y)  to  the  left  hand  side  of  Equation  4.1  giving 


frith.,  2/2,  2/p)  - fiy ) + fiy)  = 

L rir=l  /r(2/r  | X,  9r)  UrLp'+l  Pr  iVr  \ X,  9r)  flLp-'+l  PiYr  = Jr  \ X , 0r)f(x  \ 0Jdx, 


where  fr{yr  \ x,  9r)  is  a probability  density  function  for  a continuous  Yr,  pT(yr 
x,  9r)  is  a probability  mass  function  for  a discrete  Fr,  and  P(Yr  = jr  | x,  9r) 
represents  conditional  multinomial  cell  probabilities  for  a categorical  Yr.  This 
implies 


fiy)  = 

fx  flrll  fr(yr  I X,  9_r ) UrLp'+l  PriPr  \ X , 9r)  YlLp"+l  PiYr  = Jr  \ X , 9r)f{x  | 0Jdx 


+ 


fiy)  - fy(yi,  2/2 , yv ) 


(7.3) 


where  f(y)  - /y(yl5  2/2,  •••,  2/P)  = (1/  IlrLi  hrJr  )f  - frith,  2/2,  2/p)  and  9 = tt  + £ 

where  e = Op(l/-v/n).  So,  from  Equation  7.3,  we  have 

^ p" 

fiy)  = a/lM 

r= 1 

p'  p"  P 

= / /r(yr  | X , 0r)  £[  pr(?/r  | X,  0r)  JJ  P{Yt  = jr  | X,  0r)/(x  | tfjdx 

1 r=l  r=p'+l  r—p"+ 1 

P" 

+(i/n  KjT ) (zi  + e)  - /y (2/1,  2/2,  -,  2/p)- 

r=l 

Multiply  both  sides  by  Hr=i  Jr  giving 

« p'  p"  p p" 

2L  = / /r(l/r  I Z,  0r)  JJ  pr(2/r  | Z,  0r)  P(Tr  = jr  | X,  0r)/(x  | 0J  JJ  hTjr  dx 

x r=l  r—p'+ 1 r=p"+l  r=l 

P" 

+\z  - /y  (2/1,  2/2,  -,  2/p)  n + & 


r=l 


(7.4) 


141 


where,  again, 


£ = ZL  - ZL  = 0p(l/\/n) 


and 


E-fy(y  1, 2/2,  yP) n Kjr  = 0 (n^r 

r=l  \ r=l 

for  some  5 > 0.  This  second  fact  can  easily  be  seen  in  the  univariate  case,  for  a 
single  variable,  where  limfel^0(^-  - /(yi))  = 0 implies  there  exists  a S > 0 such  that 
f(yi))  — 0(h{)  implying  (n  — f{yi)h{)  = 0(h[)hi  = 0(hl+1),  where  recall  that 
hi  is  the  length  of  an  interval.  Finally,  plugging  the  right  hand  side  of  Equation  7.2 
into  the  right  hand  side  of  Equation  7.4  gives 


k p' 


2 = 

1=1  r=l 

p"  p p" 

x n Pr(yr\x  = rrij , 9r)  JJ  P(Yt  = jT  | x = rrij,  | 6Jhj  hTjr 


r=p’+l 


r=p"+ 1 


r=l 


+°  (fK)<+1  +£• 


7-5) 


r= 1 


Dropping  the  limits  on  the  right  hand  side  of  the  preceding  equation  gives 


k p’ 

zl  = II  /*■(*&■  I x = 

1=1  r=l 

p"  p p" 

X n I X = mi>  fir)  n =jr\x  = TTlj , g,.)/^  \ O^hj  hr-r 

r=p"+l 


r=p'+l 


r=l 


\5+l 


+o  (JIM' 

V r= 1 
k 

+£  + + #6, 

1=1 


(7.6) 


where  we  may  assume  O ((rir=i  ^rir)<5+1)  is  negligible,  for  sufficiently  small  hTjr, 
t = 1,  2,  ...,  p",  yr  = 1,  2,  ...,  CT,  o(n*=i  hj)  is  negligible,  for  sufficiently  small  hj, 
j = 1,  2,  ...,  h,  and  represents  the  integral  over  the  range  of  negative  infinity 


142 


to  negative  b and  b to  infinity,  i.e.,  the  sum  of  the  tail  areas,  which  is  negligible 
for  sufficiently  large  b.  This  is  due  to  the  fact  that  fr(Vr  \ x,  (tr)  Y\rPr{yr  \ 
x , 0r)  f]r  P{Yr  = jr  | x,  6r)f(x  | Oj.)  is  the  joint  distribution  of  yr,  r — 1,2,  ...,  p, 
and  x and  therefore  is  theoretically  integrable  with  respect  to  the  continuous 
variable  x . At  this  point  in  the  estimation  procedure,  the  model  parameters 
would  enter  the  right  hand  side  of  Equation  7.6  by  substituting  for  the  conditional 
distribution  parameters,  those  mappings  which  are  functions  of  x and  the  model 
parameters,  i.e.,  substituting  the  elements  of  6r  with  the  elements  of  T“1hr(x;  Pr). 
Then  the  conditional  distributions  could  be  written,  for  example  for  a continuous 
manifest  variable,  as  fr{yr  | x , ^ ),  indicating  the  substitution  has  occurred. 

Since  the  distributions  of  the  manifest  variables  are  assumed  to  be  known,  spe- 
cific forms  for  any  example  can  be  inserted  into  the  right  hand  side  of  Equation  7.6 
in  terms  of  the  model  parameters  which  enter  nonlinearly  from  these  distributions. 
The  resulting  system  of  equations  is  solved  via  iterative  generalized  nonlinear  least 
squares  using  the  same  method  as  that  described  in  Chapter  4,  by  ignoring  the 

0 ((11^=1  *rir)m),  0(UU  ^')>  alld  B>>  termS- 

The  resulting  estimators  should  be  biased.  The  reason  for  the  bias  in  the 
estimators  comes  from  the  fact  that  the  hj)  and  Bb  terms  are  dropped  in 

Equation  7.6  causing  a slight  shift  in  the  likelihood.  The  o(n*=1  hj)  term  and 
the  Bb  term  are  needed  for  Equation  7.6  to  be  the  correct  mean  function.  The 
univariate  case  was  presented  for  notational  ease,  however,  just  as  in  the  method 
presented  in  this  dissertation,  the  concept  can  be  extended  to  the  multivariate  case, 
allowing  for  groups  of  manifest  variables  conditionally  dependent  with  each  other, 
but  as  a group,  conditionally  independent,  given  x,  of  the  remaining  variables. 


REFERENCES 


[1]  Agresti,  A.  (1990)  Categorical  Data  Analysis,  New  York:  Wiley. 

[2]  Aitkin,  M.  and  Rocci,  R.  (2002),  “A  General  Maximum  Likelihood  Analysis  of 
Measurement  Error  in  Generalized  Linear  Models,”  Statistics  and  Computing, 
12(2),  163-174. 

[3]  Alfrey,  A.  C.,  LeGendre,  G.  R.,  and  Kaehny,  W.  D.  (1976),  “The  Dialysis 
Encephalopathy  Syndrome:  Possible  Aluminum  Intoxication,”  The  New 
England  Journal  of  Medicine,  294(4),  184-188. 

[4]  Alzheimer’s  Disease  and  Related  Disorders  Association,  Inc.  (2001), 

Alzheimer’s  Disease  Statistics,  Chicago:  Author. 

[5]  Amemiya,  Y.  (1985),  “Instrumental  Variable  Estimation  for  the  Nonlinear 
Errors-in- Variables  Model,”  Journal  of  Econometrics,  28,  273-289. 

[6]  Armstrong,  B.  (1985),  “Measurement  Error  in  the  Generalized  Linear  Model,” 
Communications  in  Statistics:  Simulation  and  Computation,  14(3),  529-544. 

[7]  Burquet,  A.,  Monnet,  E.,  Roth,  P.,  Hirn,  F.,  Vouaillat,  C.,  Lecourt-Ducret, 

M.  et  al.  (2000),  “Neurodevelopmental  Outcomes  of  Premature  Infants  Born 
Less  than  33  Weeks  of  Gestational  Age  and  Not  Cerebral  Palsy  at  the  Age  of  5 
Years,”  Archives  de  Pediatrie,  7(4),  357-368. 

[8]  Bushinsky,  D.  A.,  Sprague,  S.  M.,  Hallegot,  P.,  Girod,  C.,  Chabala,  J.  M.,  and 
Levi-Setti,  R.  (1995),  “Effects  of  Aluminum  on  Bone  Surface  Ion  Composi- 
tion,” Journal  of  Bone  and  Mineral  Research,  10(12),  1988-1997. 

[9]  Buzas,  J.  S.  (1997),  “Instrumental  Variable  Estimation  in  Nonlinear  Measure- 
ment Error  Models,”  Communications  in  Statistics:  Theory  and  Methodology, 
26(12),  2861-2877. 

[10]  Buzas,  J.  S.  and  Stefanski,  L.  A.  (1996),  “Instrumental  Variable  Estimation  in 
Generalized  Linear  Models,”  Journal  of  the  American  Statistical  Association, 
91(435),  999-1006. 

[11]  Buzas,  J.  S.  and  Stefanski,  L.  A.  (1996),  “Instrumental  Variable  Estimation 
in  a Probit  Measurement  Error  Model,”  Journal  of  Statistical  Planning  and 
Inference,  55,  47-62. 

[12]  Campbell,  A.  (2002),  “The  Potential  Role  of  Aluminum  in  Alzheimer’s 
Disease,”  Nephrology,  Dialysis,  Transplantation,  17,  17-20. 


143 


144 


[13]  Carroll,  R.  J.,  Ruppert,  D.,  and  Stefanski,  L.  A.  (1995)  Measurement  Error  in 
Nonlinear  Models , London:  Chapman  and  Hall. 

[14]  Carroll,  R.  J.,  Spiegelman,  C.  H.,  Lan,  K.,  Bailey,  K.  and  Abbott,  R.  (1984), 
“On  Errors  in  Variables  for  Binary  Regression  Models,”  Biometrika,  71(1), 
19-25. 

[15]  Carroll,  R.  J.  and  Stefanski,  L.  A.  (1990),  “Approximate  Quasi-Likelihood 
Estimation  in  Models  with  Surrogate  Predictors,”  Journal  of  the  American 
Statistical  Association , 85(411),  652-663. 

[16]  Carter,  R.  L.  (1976),  Instrumental  Variable  Estimation  of  the  Simple  Errors 
in  Variables  Model , unpublished  Ph.  D.  dissertation,  Iowa  State  University, 
Department  of  Statistics. 

[17]  Carter,  R.  L.  (1981),  “Restricted  Maximum  Likelihood  Estimation  of  Bias  and 
Reliability  in  the  Comparison  of  Several  Measuring  Methods,”  Biometrics, 
37(4),  733-741. 

[18]  Carter,  R.  L.  and  Fuller,  W.  A.  (1980),  “Instrumental  Variable  Estimation  of 
the  Simple  Errors-in- Variables  Model,”  Journal  of  the  American  Statistical 
Association,  75(371),  687-692. 

[19]  Cook,  J.  R.  and  Stefanski,  L.  A.  (1994),  “Simulation-Extrapolation  Estimation 
in  Parametric  Measurement  Error  Models,”  Journal  of  the  American  Statistical 
Association,  89(428),  1314-1328. 

[20]  Dennis,  J.  E.  and  Schnabel,  R.  B.  (1983)  Numerical  Methods  for  Unconstrained 
Optimization  and  Nonlinear  Equations,  New  Jersey:  Prentice-Hall. 

[21]  Foutz,  R.V.  (1977),  “On  The  Unique  Consistent  Solutions  to  the  Likelihood 
Equations,”  Journal  of  the  American  Statistical  Association,  72(357),  147-148. 

[22]  Fuller,  W.  A.  (1976)  Introduction  to  Statistical  Time  Series,  New  York:  Wiley. 

[23]  Fuller,  W.  A.  (1980),  “Properties  of  Some  Estimators  for  the  Errors  in 
Variables  Model,”  The  Annals  of  Statistics,  8(2),  407-422. 

[24]  Fuller,  W.  A.  (1987)  Measurement  Error  Models,  New  York:  Wiley. 

[25]  Geerlings,  M.  I.,  Ruitenberg,  A.,  Witteman,  J.  C.,  Van  Swieten,  J.  C.,  Hof- 
man,  A.,  Van  Duijn,  C.  M.,  Breteler,  M.  M.  and  Launer,  L.  J.  (2001),  “Repro- 
ductive Period  and  Risk  of  Dementia  in  Postmenopausal  Women,”  Journal  of 
the  American  Medical  Association,  285,  1475-1481. 

[26]  Gleser,  L.  J.  (1990),  “Improvements  of  the  Naive  Approach  to  Estimation 
in  Nonlinear  Errors-in-Variables  Regression  Models,”  In  Contemporary 
Mathematics  Volume  112:  Statistical  Analysis  of  Measurement  Error  Models 
and  Applications,  eds.  P.  J.  Brown  and  W.  A.  Fuller,  99-114. 


145 


[27]  Greene,  R.  A.  (2000),  “Estrogen  and  Cerebral  Blood  Flow;  A Mechanism  to 
Explain  the  Impact  of  Estrogen  on  the  Incidence  and  Treatment  of  Alzheimer’s 
Disease,”  International  Journal  of  Fertility  and  Women’s  Medicine , 45,  253- 
257. 

[28]  Greenland,  S.  (2000),  “An  Introduction  to  Instrumental  Variables  for  Epidemi- 
ologists,” International  Journal  of  Epidemiology,  29(4),  722-729. 

[29]  Griliches,  Z.  and  Ringstad,  V.  (1970),  “Error-in-the- Variables  Bias  in  Nonlin- 
ear Contexts,”  Econometrica,  38(2),  368-370. 

[30]  Hack,  M.  and  Fanaroff,  A.  A.  (2000),  “Outcomes  of  Children  of  Extremely 
Low  Birthweight  and  Gestational  Age  in  the  1990’s,”  Seminars  in  Neonatology , 
5(2),  89-106. 

[31]  Hartley,  H.  O.  (1961),  “The  Modified  Gauss-Newton  Method  for  the  Fitting  of 
Non-linear  Regression  Functions  by  Least  Squares,”  Technometrics , 3,  269-280. 

[32]  Honig,  L.  S.  and  Mayeux,  R.  (2001),  “Natural  History  of  Alzheimer’s  Disease,” 
Aging- Clinical  and  Experimental  Research,  13,  171-182. 

[33]  Kalbfleisch,  J.  D.  and  Prentice,  R.  L.  (1980)  The  Statistical  Analysis  of  Failure 
Time  Data , New  York:  Wiley. 

[34]  Khuri,  A.  I.  (1993)  Advanced  Calculus  with  Applications  in  Statistics,  New 
York:  Wiley. 

[35]  Klein,  J.  P.  and  Moeschberger,  M.  L.  (1997)  Survival  Analysis  Techniques  for 
Censored  and  Truncated  Data,  New  York:  Springer. 

[36]  Klepper,  S.  and  Learner,  E.  E.  (1984),  “Consistent  Sets  of  Estimates  for 
Regressions  with  Errors  in  all  Variables,”  Econometrica,  52(1),  163-183. 

[37]  Larsen,  T.,  Nguyen,  T.  H.,  Griesen,  G.,  Engholm,  G.,  and  Moller,  H.  (2000), 
“Does  a Discrepancy  Between  Gestational  Age  Determined  by  Bipariental 
Diameter  and  Last  Menstrual  Period  Sometimes  Signify  Early  Intrauterine 
Growth  Retardation?,”  BJOG:  An  International  Journal  of  Obstetrics  and 
Gynaecology , 107(2),  238-244. 

[38]  Lazarsfeld,  P.  F.  and  Henry,  N.  W.  (1968)  Latent  Structure  Analysis , Boston: 
Houghton  Mifflin. 

[39]  Mackanjee,  H.  R.,  Iliescu,  B.  M.,  Dawson,  W.  B.  (1996),  “Assessment  of 
Postnatal  Gestational  Age  Using  Sonographic  Measurement  of  Femur  Length,” 
Journal  of  Ultrasound  in  Medicine,  15(2),  115-120. 

[40]  McCullagh,  P.  and  Nelder,  J.  A.  (1989)  Generalized  Linear  Models,  Boca 
Raton:  Chapman  and  Hall/CRC. 


146 


[41]  Neriishi,  K.,  Stram,  D.,  Vaeth,  M.,  Mizuno,  S.  and  Akiba,  S.  (1991),  “The 
Observed  Relationship  Between  the  Occurrence  of  Acute  Radiation  Effects  and 
Leukemia  Mortality  among  A-Bomb  Survivors,”  Radiation  Research,  125(2), 
206-213. 

[42]  Pierce,  D.  A.,  Stram,  D.  O.,  and  Vaeth,  M.  (1990),  “Allowing  for  Random 
Errors  in  Radiation  Dose  Estimates  for  the  Atomic  Bomb  Survivor  Data,” 
Radiation  Research,  123(3),  275-284. 

[43]  Pierce,  D.  A.,  Stram,  D.  O.,  Vaeth,  M.,  and  Schafer,  D.  (1992),  “The  Errors  in 
Variables  Problem:  Considerations  Provided  by  the  Radiation  Dose-Response 
Analyses  of  the  A-Bomb  Survivor  Data,”  Journal  of  the  American  Statistical 
Association , 87(418),  351-359. 

[44]  Prentice,  R.  L.  (1982),  “Covariate  Measurement  Errors  and  Parameter 
Estimation  in  a Failure  Time  Regression  Model,”  Biometrika , 69(2),  331-342. 

[45]  Rao,  C.  R.  (1973)  Linear  Statistical  Inference  and  its  Applications,  New  York: 
Wiley. 

[46]  Reiersol,  0.  (1950),  “Identifiability  of  a Linear  Relation  Between  Variables 
which  are  Subject  to  Error,”  Econometrica,  18,  375-389. 

[47]  Resnick,  M.  B.,  Ariet,  M.,  Carter,  R.  L.,  Bucciarelli,  R.  L.,  Furlough,  R., 

Evans,  J.  H.,  et  al.  (1988),  “Prospective  Pricing  Model  for  Neonatologists  and 
Obstetricians  in  Tertiary  Care  Centers,”  Pediatrics,  82,  442-446. 

[48]  Rudin,  W.  (1976)  Principles  of  Mathematical  Analysis,  Singapore:  McGraw- 
Hill. 

[49]  Schott,  J.  R.  (1997)  Matrix  Analysis  for  Statistics,  New  York:  Wiley. 

[50]  Sen,  P.  K.  and  Singer,  J.  M.  (1993)  Large  Sample  Methods  in  Statistics:  An 
Introduction  with  Applications,  New  York:  Chapman  and  Hall. 

[51]  Serfling,  R.  J.  (1980)  Approximation  Theorems  of  Mathematical  Statistics,  New 
York:  Wiley. 

[52]  Shah.  N.  (1998),  Estimated  Generalized  Nonlinear  Least  Squares  for  Latent 
Class  Analysis  of  Diagnostic  Tests,  unpublished  Ph.  D.  dissertation,  University 
of  Florida,  Department  of  Statistics. 

[53]  Smith,  L.  N.,  Dayal,  V.  H.,  and  Monga,  M.  (1999),  “Prior  Knowledge  of 
Obstetric  Gestational  Age  and  Possible  bias  of  Ballard  Score,”  Obstetrics  and 
Gynecology,  93(5),  712-714. 

[54]  Sposto,  R.,  Stram,  D.  O.,  and  Awa,  A.  A.  (1991),  “An  Estimate  of  the  Mag- 
nitude of  Random  Errors  in  the  DS86  Dosimetry  from  Data  on  Chromosome 
Aberrations  and  Severe  Epilation,”  Radiation  Research,  128(2),  157-169. 


147 


[55]  Stefanski,  L.  A.  (1985),  “The  Effects  of  Measurement  Error  on  Parameter 
Estimation,”  Biometrika , 72(3),  583-592. 

[56]  Stefanski,  L.  A.  (2000),  “Measurement  Error  Models,”  Journal  of  the  American 
Statistical  Association,  95(452),  1353-1358. 

[57]  Stefanski,  L.  A.  and  Boos,  D.  D.  (2002),  “The  Calculus  of  M-Estimation,”  The 
American  Statistician , 56(1),  29-38. 

[58]  Stefanski,  L.  A.  and  Buzas,  J.  S.  (1995),  “Instrumental  Variable  Estimation 
in  Binary  Regression  Measurement  Error  Models,”  Journal  of  the  American 
Statistical  Association,  90(430),  541-550. 

[59]  Tang,  M.  X.,  Cross  P.,  Andrews,  H.,  Jacobs,  D.  M.,  Small,  S.,  Bell,  K., 
Merchant,  C.,  Lantigua,  R.,  Costa,  R.,  Stern,  Y.,  and  Mayeux,  R.  (2001), 
“Incidence  of  AD  in  African-Americans,  Caribbean  Hispanics,  and  Caucasians 
in  Northern  Manhattan,”  Neurology , 56,  49-56. 

[60]  Thompson,  J.  R.,  Carter,  R.  L.,  Ariet,  M.,  Roth,  J.,  Ross,  N.,  and  Resnick,  M. 
B.  (2001),  “An  Assessment  of  Birth  Weight  and  Social  Demographic  Effects 
on  Early  Childhood  Disability  or  Developmental  Delay,”  Florida  Health  Care 
Journal,  2(1),  7-15. 

[61]  Thompson,  J.  R.,  Carter,  R.  L.,  Edwards,  A.  R.,  Roth,  J.,  Ariet,  M.,  Ross,  N., 
and  Resnick,  M.  B.,  “A  Population  Based  Study  of  the  Effects  of  Birth  Weight 
on  Early  Developmental  Delay  or  Disability  in  Children,”  American  Journal  of 
Perinatology , To  appear. 

[62]  Thoresen,  M.  and  Laake,  P.  (1999),  “Instrumental  Variable  Estimation  in 
Logistic  Measurement  Error  Models  by  Means  of  Factor  Scores,”  Communica- 
tions in  Statistics:  Theory  and  Methodology , 28(2),  297-313. 

[63]  Upton,  G.  and  Cook,  I.  (2002)  A Dictionary  of  Statistics,  New  York:  Oxford 
University  Press. 

[64]  Wolfe,  M.  A.  (1978)  Numerical  Methods  for  Unconstrained  Optimization:  an 
Introduction,  New  York:  Van  Nostrand  Reinhold. 

[65]  Wolter,  K.  M.  and  Fuller,  W.  A.  (1982),  “Estimation  of  Nonlinear  Errors  in 
Variables  Models,”  The  Annals  of  Statistics,  10(2),  539-548. 

[66]  Woods,  N.  S.,  Marlow,  N.,  Costeloe,  K.,  Gibson,  A.  T.,  and  Wilkinson,  A.  R. 
(2000),  “Neurologic  and  Developmental  Disabilities  After  Extremely  Preterm 
Birth,”  New  England  Journal  of  Medicine,  343(6),  378-384. 

[67]  Yokel,  R.  A.  and  O’Callaghan,  J.  P.  (1998),  “An  Aluminum-Induced  Increase 
in  GFAP  is  Attenuated  by  Some  Chelators,”  Neurotoxicology  and  Teratology, 
20(1),  55-60. 


BIOGRAPHICAL  SKETCH 

Jeffrey  Ray  Thompson  was  born  June  17,  1973,  to  Ron  and  Ellie  Thompson 
in  Natrona  Heights,  Pennsylvania.  He  has  one  elder  sister,  Jennifer.  After  living 
in  both  Pennsylvania  and  Ohio  for  a few  short  years,  Jeff’s  family  relocated 
to  Orlando,  Florida.  Jeff  spent  the  majority  of  his  adolescence  growing  up  in 
Orlando,  where  he  graduated  from  Lake  Brantley  High  School  in  1991.  He  went 
to  Stetson  University  in  Deland,  Florida,  where  he  obtained  a Bachelor  of  Science 
degree  in  mathematics  in  1995.  It  was  while  attending  Stetson  that  Jeff  developed 
his  interest  for  the  field  of  statistics.  His  Master  of  Science  degree  in  statistical 
computing  was  granted  by  the  University  of  Central  Florida  in  Orlando  in  1997  and 
it  was  through  the  opportunity  to  teach  undergraduate  statistics  while  working  on 
his  master’s  degree  that  Jeff  developed  his  passion  for  teaching.  Desiring  a career 
in  academia,  Jeff  decided  to  pursue  his  Ph.D.  in  statistics  at  the  University  of 
Florida,  in  Gainesville,  where  he  intends  to  graduate  in  2003.  He  has  accepted  an 
Assistant  Professor  teaching  position  with  the  Department  of  Statistics  at  North 
Carolina  State  University,  in  Raleigh,  NC. 


148 


I certify  that  I have  read  this  study  and  that  in  my  opinion  it  conforms  to 
acceptable  standards  of  scholarly  presentation  and  is  fully  adequate,  in  scope  and 
quality,  as  a dissertation  for  the  degree  of  Doctor  of  Philosophy. 

V 

Randy  L.  Carter,  Chair 
Professor  of  Statistics 


I certify  that  I have  read  this  study  and  that  in  my  opinion  it  conforms  to 
acceptable  standards  of  scholarly  presentation  and  is  fully  adequate,  in  scope  and 
quality,  as  a dissertation  for  the  degree  of  Doctor  of  Philosophy. 


UU-  G 

Wo  s W 

Malay  Ghosh'' 

Distinguished  Professor  of  Statistics 


I certify  that  I have  read  this  study  and  that  in  my  opinion  it  conforms  to 
acceptable  standards  of  scholarly  presentation  and-isJully  adequate^dn'scope  and 
quality,  as  a dissertation  for  the  degree  of  Dqbtor  of  .Philosophy.  / f/ 

Ramon  C.  Littell 
Professor  of  Statistics 


I certify  that  I have  read  this  study  and  that  in  my  opinion  it  conforms  to 
acceptable  standards  of  scholarly  presentation  and  is  fully  adequate,  in  scope  and 
quality,  as  a dissertation  for  the  degree  of  Doctor  of  Philosophy. 


Ronald  H.  Randles 
Professor  of  Statistics 

I certify  that  I have  read  this  study  and  that  in  my  opinion  it  conforms  to 
acceptable  standards  of  scholarly  presentation  and  is  fully  adequate,  in ^sewpe  and 
quality,  as  a dissertation  for  the  degree  of  Doctor  of  Philosophy.  / / 

Michael  B.  Resnick 
Professor  of  Pediatrics 


This  dissertation  was  submitted  to  the  Graduate  Faculty  of  the  Department  of 
Statistics  in  the  College  of  Liberal  Arts  and  Sciences  and  to  the  Graduate  School 
and  was  accepted  as  partial  fulfillment  of  the  requirements  for  the  degree  of  Doctor 
of  Philosophy. 

August  2003  


Dean,  Graduate  School 


